Grow your team on GitHub
GitHub is home to over 50 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
Sign up
Pinned repositories
Repositories
autoextract-poet
web-poet definitions for AutoExtract
spidermon
Scrapy Extension for monitoring spiders execution.
dateparser
python parser for human readable dates
scrapy-poet
Page Object pattern for Scrapy
scrapy-autounit
Automatic unit test generation for Scrapy.
extruct
Extract embedded metadata from HTML markup
scrapinghub-autoextract
Python clients for Scrapinghub AutoExtract API
splash
Lightweight, scriptable browser as a service with an HTTP API
article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
crawlera-headless-proxy
A complimentary proxy to help to use Crawlera with headless browsers
autoextract-spiders
Pre-built Scrapy spiders for AutoExtract
webstruct-demo
HTTP demo for https://github.com/scrapinghub/webstruct
frontera
A scalable frontier for web crawlers
python-scrapinghub
A client interface for Scrapinghub's API
baseimage-docker
Forked from phusion/baseimage-dockerA minimal Ubuntu base image modified for Docker-friendliness
scrapinghub-stack-scrapy
Software stack with latest Scrapy and updated deps
marathon-apps-collectd-plugin
Forked from jsargiot/marathon-apps-collectd-pluginmarathon-apps-collectd-plugin
mochiweb
Forked from shaneaevans/mochiwebMochiWeb is an Erlang library for building lightweight HTTP servers.

