List of libraries, tools and APIs for web scraping and data processing.
Makefile
Updated Nov 1, 2018
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
PHP
Updated Oct 31, 2018
Web Scraping Framework
Python
Updated Sep 15, 2018
Apify SDK — The scalable web crawling and scraping library for JavaScript. Enables development of data extraction and…
General Assembly's 2015 Data Science course in Washington, DC
Jupyter Notebook
Updated Apr 18, 2016
A framework for creating semi-automatic web content extractors
Python
Updated May 1, 2018
Random User-Agent middleware based on fake-useragent
An unofficial API for Quora.
Python
Updated Oct 9, 2016
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO:
http://index.elasticsearch.cn
Go
Updated Sep 1, 2018
Modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or si…
Ruby
Updated Oct 12, 2018
ACHE is a web crawler for domain-specific search.
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Python
Updated Mar 22, 2017
Simple Query Scraping with CSS and Go Reflection
Go
Updated Oct 21, 2016
Scrapy crawler to collect data on the back catalog of songs listed for sale.
Python
Updated May 24, 2017
A JavaScript library for generating random user agents with data that's updated daily.
JavaScript
Updated Nov 6, 2018
Kantu for Chrome and Firefox - Modern Web Browser Automation plus Selenium IDE
MetaData html scraper and parser for Node.js (supports Promises and callback style)
JavaScript
Updated Oct 5, 2018
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.o…
JavaScript
Updated Jul 17, 2018
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Jupyter Notebook
Updated Feb 12, 2017
Android app for saving webpages for offline reading.
Java
Updated Jun 22, 2018
Python bindings to Modest engine (fast HTML5 parser with CSS selectors).
Python
Updated Sep 30, 2018
Zillow Scraper for Python using Selenium
Python
Updated Nov 4, 2018
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Python
Updated Apr 6, 2017
Scrapy Training companion code
Python
Updated Jul 19, 2017
Tutorial: Web scraping in Python with Beautiful Soup
Jupyter Notebook
Updated Aug 26, 2017
Twitter Intelligence OSINT project performs tracking and analysis of the Twitter
Python
Updated Oct 31, 2018
Collection of scripts corresponding to LucidProgramming YouTube tutorials
Python
Updated Nov 5, 2018
Web scrapping and related analytics using Python tools
Jupyter Notebook
Updated Nov 6, 2018
💦 Tools to Work with the 'Splash' JavaScript Rendering Service in R
R
Updated Aug 14, 2018
Headless 'Chrome' Orchestration in R
R
Updated Oct 7, 2018