facert / awesome-spider

9.3k

爬虫集合

spider python awesome

Updated May 30, 2019

gocolly / colly

8.3k

Elegant Scraper and Crawler Framework for Golang

golang scraper framework crawler scraping crawling spider go

Go Updated Jul 25, 2019

jhao104 / proxy_pool

6.9k

Python爬虫代理IP池(proxy pool)

crawler proxy proxypool spider ssdb flask schedule crawl

Python Updated Jul 22, 2019

henrylee2cn / pholcus

5.8k

[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.

crawler spider multi-interface golang distributed-crawler high-concurrency-crawler fastest-crawler cross-platform-crawler

Go Updated Apr 30, 2019

s0md3v / Photon

5.5k

Incredibly fast crawler designed for OSINT.

crawler spider python osint information-gathering

Python Updated Jun 3, 2019

luyishisi / Anti-Anti-Spider

5.2k

越来越多的网站具有反爬虫特性，有的用图片隐藏关键数据，有的使用反人类的验证码，建立反反爬虫的代码仓库，通过与不同特性的网站做斗争（无恶意）提高技术。（欢迎提交难以采集的网站）（因工作原因，项目暂停）

python spider geek

HTML Updated Jan 11, 2019

bda-research / node-crawler

4.8k

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

crawler javascript spider extract-data cheerio jquery nodejs

Good first issues

Update documentation to reflect that node-crawler is based on request

Documentation

#115 opened almost 5 years ago by

JavaScript Updated Jun 10, 2019

guyueyingmu / avbook

4.5k

AV电影管理系统， avmoo , javbus , javlibrary 爬虫，线上AV影片图书馆，AV磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - …

javbus avmoo javlibrary spider crawler laravel scraper adult magnet-link magnet database adult-video guzzlehttp

PHP Updated Jul 19, 2019

SpiderClub / haipproxy

3.8k

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

high-availability scrapy ipproxy distributed redis crawler scheduler spider

Python Updated Jul 23, 2019

BruceDone / awesome-crawler

3.6k

A collection of awesome web crawler,spider in different languages

web-crawler crawler web-scraper spider node-crawler scraper awesome

Updated Apr 18, 2019

gaojiuli / toapi

2.9k

Every web site provides APIs.

html json api python web spider crawler flask toapi

Python Updated Dec 6, 2018

shengqiangzhang / examples-of-web-crawlers

2.6k

一些有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。

crawler spider taobao tmall example python selenium pyquery stock fund multithreading agent-pool wechat wechat-report

Python Updated Jun 26, 2019

shiyanhui / dht

2k

BitTorrent DHT Protocol && DHT Spider.

dht spider bittorrent-dht-protocol go

Go Updated Mar 20, 2019

gaojiuli / gain

1.9k

Web crawling framework based on asyncio.

python crawler spider asyncio uvloop aiohttp

Python Updated Jun 1, 2019

DormyMo / SpiderKeeper

1.9k

admin ui for scrapy/open source scrapinghub

scrapy dashboard scrapyd scrapy-ui scrapyd-dashboard scrapyd-ui spider

Python Updated Mar 21, 2019

jae-jae / QueryList

1.7k

🕷 The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

querylist crawler spider scraper

PHP Updated May 31, 2019

hu17889 / go_spider

1.5k

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be ex…

spider crawler go schedule pipeline

Go Updated Nov 16, 2017

howie6879 / owllook

1.5k

owllook-在线网络小说阅读网站&小说搜索引擎&小说推荐系统[搜索、追书、收藏、追更、小说API]

novels novels-search book owllook sanic reader schedule spider asyncio aiohttp uvloop

Python Updated Jun 18, 2019

Gerapy / Gerapy

1.4k

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js

scrapy distributed webspider scrapyd ui dashboard spider django vuejs

JavaScript Updated Jun 3, 2019

xianhu / PSpider

1.3k

简单易用的Python爬虫框架，QQ交流群：597510560

crawler spider python proxies web-spider multi-threading web-crawler python-spider multiprocessing

Python Updated Jul 26, 2019

my8100 / scrapydweb

1.1k

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Email notic…

scrapy scrapyd scrapyd-ui scrapyd-api scrapyd-admin scrapyd-manage log-parsing log-analysis scrapyd-monitor scrapyd-keeper scrapyd-control scrapy-log-analysis scrapyd-log-analysis scrapy-visualization scrapyd-visualization dashboard spider scrapyd-cluster-management

Good first issues

User Guide | Q&A | 用户指南 | 问答

good first issue question

#7 opened 8 months ago by LWsmile

48

Python Updated Jul 25, 2019

jumper2014 / lianjia-beike-spider

1.1k

链家网和贝壳网房价爬虫，采集北京上海广州深圳等21个中国主要城市的房价数据（小区，二手房，出租房，新房），稳定可靠快速！支持csv,MySQL, MongoDB,Excel, json存储，支持Python2和3，图表展示数据，注释丰富 🚁

lianjia spider crawler

Python Updated Jul 24, 2019

JayBizzle / Crawler-Detect

972

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

php user-agent crawler spider bots detect

PHP Updated Jun 14, 2019

kiddyuchina / Beanbun

939

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

php spider crawler beanbun

PHP Updated Aug 30, 2018

howie6879 / ruia

914

Async Python 3.6+ web scraping micro-framework based on asyncio.

asyncio aiohttp asyncio-spider crawler crawling-framework spider uvloop ruia

Python Updated Jul 12, 2019

holgerd77 / django-dynamic-scraper

893

Creating Scrapy scrapers via the Django admin interface

python django scraper scraping scrapy spider webscraping

Python Updated Jul 18, 2019

gsh199449 / spider

864

A configurable web spider with a easy-to-use web console

spider gatherplatform web-console text-mining cralwer

Java Updated Aug 21, 2018

geziyor / geziyor

841

Geziyor, a blazing fast web crawling & scraping framework for Go

go scraping scraper crawler spider

Go Updated Jul 21, 2019

keenwon / antcolony

819

Nodejs实现的一个磁力链接爬虫 http://findit.keenwon.com (原域名http://findit.so )

nodejs spider torrent bittorrent bencode dht javascript antcolony

JavaScript Updated Dec 28, 2018

wycm / zhihu-crawler

769

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

zhihu spider crawler java

Java Updated Apr 2, 2019

spider

Repositories 1,736

facert / awesome-spider

gocolly / colly

jhao104 / proxy_pool

henrylee2cn / pholcus

s0md3v / Photon

luyishisi / Anti-Anti-Spider

bda-research / node-crawler

Good first issues

Update documentation to reflect that node-crawler is based on request

guyueyingmu / avbook

SpiderClub / haipproxy

BruceDone / awesome-crawler

gaojiuli / toapi

shengqiangzhang / examples-of-web-crawlers

shiyanhui / dht

gaojiuli / gain

DormyMo / SpiderKeeper

jae-jae / QueryList

hu17889 / go_spider

howie6879 / owllook

Gerapy / Gerapy

xianhu / PSpider

my8100 / scrapydweb

Good first issues

User Guide | Q&A | 用户指南 | 问答

jumper2014 / lianjia-beike-spider

JayBizzle / Crawler-Detect

kiddyuchina / Beanbun

howie6879 / ruia

holgerd77 / django-dynamic-scraper

gsh199449 / spider

geziyor / geziyor

keenwon / antcolony

wycm / zhihu-crawler

Related topics