Scrapy

{{Infobox software

| name = Scrapy

| logo = File:Scrapy logo.jpg

| screenshot =

| caption =

| collapsible =

| author =

| developer = Zyte (formerly Scrapinghub)

| released = {{Start date|2008|06|26|df=yes}}

| discontinued =

| latest release version = {{wikidata|property|reference|edit|P348}}

| latest release date = {{start date and age|{{wikidata|qualifier|P348|P577}}}}

| latest preview version =

| latest preview date =

| programming language = Python

| operating system = Windows, macOS, Linux

| platform =

| size =

| language =

| genre = Web crawler

| license = BSD License

}}

Scrapy ({{IPAc-en|ˈ|s|k|r|eI|p|aI}}{{Cite web |url=https://github.com/scrapy/scrapy/commit/975f15003efc911809983150852e04433d9811dd |title=Commit 975f150 |website=GitHub |access-date=2021-10-18 |archive-date=2021-10-18 |archive-url=https://web.archive.org/web/20211018084517/https://github.com/scrapy/scrapy/commit/975f15003efc911809983150852e04433d9811dd |url-status=live }} {{respell|SKRAY|peye}}) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.[http://doc.scrapy.org/en/latest/intro/overview.html Scrapy at a glance] {{Webarchive|url=https://web.archive.org/web/20180917054138/http://doc.scrapy.org/en/latest/intro/overview.html |date=2018-09-17 }}. It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,{{cite web |title=Frequently Asked Questions |url=http://doc.scrapy.org/en/latest/faq.html#did-scrapy-steal-x-from-django |access-date=28 July 2015 |website=Frequently Asked Questions, Scrapy 2.8.0 documentation |language=en-US |archive-date=11 November 2020 |archive-url=https://web.archive.org/web/20201111175100/https://doc.scrapy.org/en/latest/faq.html#did-scrapy-steal-x-from-django |url-status=live }} it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.

Some well-known companies and products using Scrapy are: Lyst,{{ cite web |url= http://talks.lystit.com/dsl-scraping-presentation/#/4 |title=Scalable Scraping Using Machine Learning |first1=Eddie |last1=Bell |first2=Jonathan |last2=Heusser |access-date= 28 July 2015 |archive-url=https://web.archive.org/web/20160604082034/http://talks.lystit.com/dsl-scraping-presentation/#/4 |archive-date=4 June 2016 |url-status=dead}}{{Cite web |url=http://scrapy.org/companies/ |title=Scrapy {{!}} Companies using Scrapy |access-date=2015-07-28 |archive-date=2020-11-12 |archive-url=https://web.archive.org/web/20201112031322/https://scrapy.org/companies/ |url-status=live }} Parse.ly,{{cite web |last=Montalenti |first=Andrew |date=October 27, 2012 |title=Web Crawling & Metadata Extraction in Python |url=https://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python |access-date=May 11, 2015 |website=Web Crawling & Metadata Extraction in Python - Speaker Deck |language=en-US |archive-date=September 19, 2020 |archive-url=https://web.archive.org/web/20200919065625/https://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python |url-status=live }} Sayone Technologies,{{Cite web |title=Scrapy Companies |url=https://scrapy.org/companies/ |website=Scrapy {{!}} Companies using Scrapy |access-date=2017-11-09 |archive-date=2020-11-12 |archive-url=https://web.archive.org/web/20201112031322/https://scrapy.org/companies/ |url-status=live }} Sciences Po Medialab,{{Cite web |url=http://www.medialab.sciences-po.fr/blog/hyphe-v0-0-0-the-first-release-of-our-new-webcrawler-is-out/ |title=Hyphe v0.0.0: the first release of our new webcrawler is out! |date=17 November 2013 |access-date=2015-07-28 |archive-date=2016-06-13 |archive-url=https://web.archive.org/web/20160613210348/http://www.medialab.sciences-po.fr/blog/hyphe-v0-0-0-the-first-release-of-our-new-webcrawler-is-out/ |url-status=live }} Data.gov.uk’s World Government Data site.{{Cite tweet |user=bfirsh |author=Ben Firshman |number=8025368963 |date = 21 January 2010 |title=World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords http://bit.ly/5jU3La #opendata #datastore }}

History

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.{{cite mailing list |url=https://groups.google.com/forum/#!topic/scrapy-users/sMbBVIq0sko |title=Scrapy 1.0 official release out! |mailing-list=scrapy-users |last=Medina |first=Julia |date=19 June 2015 |access-date=28 July 2015 |archive-date=25 January 2010 |archive-url=https://web.archive.org/web/20100125115446/http://groups.google.co.uk/group/net.unix-wizards/msg/4dadd63a976019d7?dmode=source#!topic/scrapy-users/sMbBVIq0sko |url-status=live }} In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.{{cite book |author=Hoffman |first=Pablo |url=https://github.com/scrapy/scrapy/blob/master/AUTHORS |title=List of the primary authors & contributors |year=2013 |language=en-US |accessdate=18 November 2013 |archive-date=29 May 2017 |archive-url=https://web.archive.org/web/20170529225845/https://github.com/scrapy/scrapy/blob/master/AUTHORS |url-status=live }}[http://decisionstats.com/2015/12/12/interview-scrapinghub-python-webcrawling/ Interview Scraping Hub] {{Webarchive|url=https://web.archive.org/web/20201029160837/http://decisionstats.com/2015/12/12/interview-scrapinghub-python-webcrawling/ |date=2020-10-29 }}.

References

Category:Web crawlers

Category:Web scraping

Category:Free software programmed in Python

Category:Software using the BSD license