Scrapy (/ˈskrp/[2] SKRAY-peye) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.[3] It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company.

Scrapy
Developer(s)Zyte (formerly Scrapinghub)
Initial release26 June 2008 (2008-06-26)
Stable release
2.12.0[1] Edit this on Wikidata / 18 November 2024; 2 months ago (18 November 2024)
Repository
Written inPython
Operating systemWindows, macOS, Linux
TypeWeb crawler
LicenseBSD License
Websitescrapy.org Edit this on Wikidata

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.

Some well-known companies and products using Scrapy are: Lyst,[5][6] Parse.ly,[7] Sayone Technologies,[8] Sciences Po Medialab,[9] Data.gov.uk’s World Government Data site.[10]

History

edit

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.[11] In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.[12][13]

References

edit
  1. ^ "Release 2.12.0". 18 November 2024. Retrieved 29 November 2024.
  2. ^ "Commit 975f150". GitHub. Archived from the original on 2021-10-18. Retrieved 2021-10-18.
  3. ^ Scrapy at a glance Archived 2018-09-17 at the Wayback Machine.
  4. ^ "Frequently Asked Questions". Frequently Asked Questions, Scrapy 2.8.0 documentation. Archived from the original on 11 November 2020. Retrieved 28 July 2015.
  5. ^ Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Archived from the original on 4 June 2016. Retrieved 28 July 2015.
  6. ^ "Scrapy | Companies using Scrapy". Archived from the original on 2020-11-12. Retrieved 2015-07-28.
  7. ^ Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python". Web Crawling & Metadata Extraction in Python - Speaker Deck. Archived from the original on September 19, 2020. Retrieved May 11, 2015.
  8. ^ "Scrapy Companies". Scrapy | Companies using Scrapy. Archived from the original on 2020-11-12. Retrieved 2017-11-09.
  9. ^ "Hyphe v0.0.0: the first release of our new webcrawler is out!". 17 November 2013. Archived from the original on 2016-06-13. Retrieved 2015-07-28.
  10. ^ Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords http://bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.
  11. ^ Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list). Archived from the original on 25 January 2010. Retrieved 28 July 2015.
  12. ^ Hoffman, Pablo (2013). List of the primary authors & contributors. Archived from the original on 29 May 2017. Retrieved 18 November 2013.
  13. ^ Interview Scraping Hub Archived 2020-10-29 at the Wayback Machine.