List of libraries, tools and APIs for web scraping and data processing.
Makefile
Updated Feb 17, 2019
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
PHP
Updated Mar 7, 2019
Web Scraping Framework
Python
Updated Mar 7, 2019
Apify SDK — The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extrac…
General Assembly's 2015 Data Science course in Washington, DC
Jupyter Notebook
Updated Apr 18, 2016
Simple web scraping for R
Nextjs server to query websites with GraphQL
JavaScript
Updated Jan 26, 2019
A framework for creating semi-automatic web content extractors
Python
Updated Jan 7, 2019
Random User-Agent middleware based on fake-useragent
Modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or si…
Ruby
Updated Jan 30, 2019
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO:
http://index.elasticsearch.cn
Go
Updated Feb 23, 2019
ACHE is a web crawler for domain-specific search.
An unofficial API for Quora.
Python
Updated Oct 9, 2016
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Python
Updated Mar 22, 2017
A JavaScript library for generating random user agents with data that's updated daily.
JavaScript
Updated Mar 22, 2019
Kantu for Chrome and Firefox - Modern Web Browser Automation plus Selenium IDE
Collection of scripts corresponding to LucidProgramming YouTube tutorials
Python
Updated Mar 15, 2019
Simple Query Scraping with CSS and Go Reflection
Go
Updated Oct 21, 2016
Scrapy crawler to collect data on the back catalog of songs listed for sale.
Python
Updated Jan 4, 2019
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Jupyter Notebook
Updated Feb 12, 2017
Python bindings to Modest engine (fast HTML5 parser with CSS selectors).
Tutorial: Web scraping in Python with Beautiful Soup
Jupyter Notebook
Updated Nov 18, 2018
Scrapy Training companion code
Python
Updated Jan 30, 2019
Zillow Scraper for Python using Selenium
Python
Updated Nov 4, 2018
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.o…
JavaScript
Updated Jul 17, 2018
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Python
Updated Apr 6, 2017
MetaData html scraper and parser for Node.js (supports Promises and callback style)
JavaScript
Updated Oct 5, 2018
Android app for saving webpages for offline reading.
Java
Updated Jun 22, 2018
Twitter Intelligence OSINT project performs tracking and analysis of the Twitter
Python
Updated Jan 16, 2019
Web scrapping and related analytics using Python tools
Jupyter Notebook
Updated Mar 7, 2019