The Wayback Machine - https://web.archive.org/web/20200812014217/https://github.com/topics/web-crawler

#

web-crawler

Here are 508 public repositories matching this topic...

crawlab

crawlab-team / crawlab

Star

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

go docker platform crawler spider web-crawler scrapy webcrawler scrapyd-ui webspider crawling-tasks crawlab spiders-management

Updated Aug 10, 2020
Go

BruceDone / awesome-crawler

Star

A collection of awesome web crawler,spider in different languages

crawler scraper awesome spider web-crawler web-scraper node-crawler

Updated Aug 5, 2020

apache / nutch

Star

Apache Nutch is an extensible and scalable web crawler

java hadoop web-crawler nutch crawling apache

Updated Aug 11, 2020
Java

sjdirect / abot

Star

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

c-sharp unit-testing crawler spider csharp parsing cross-platform web-crawler netcore log4net takes-care flexibility pluggable spiders csharp-library abot netcore2 netstandard20 netcore3 javascript-renderer netstandard21 abot-nuget icrawldecisionmaker netsta

Updated Jun 13, 2020
C#

xianhu / PSpider

Star

简单易用的Python爬虫框架，QQ交流群：597510560

python crawler multi-threading spider multiprocessing web-crawler proxies python-spider web-spider

Updated Mar 3, 2020
Python

DigitalPebble / storm-crawler

Star

A scalable, mature and versatile web crawler based on Apache Storm

java web-crawler distributed apache-storm

Updated Aug 5, 2020
Java

USCDataScience / sparkler

Star

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

search search-engine distributed-systems information-retrieval big-data spark solr web-crawler nutch tika sparkles

Updated May 21, 2020
Java

VIDA-NYU / ache

Star

ACHE is a web crawler for domain-specific search.

web-crawler web-scraping web-spider focused-crawler domain-specific-search web-search

Updated Jul 29, 2020
Java

brendonboshell / supercrawler

Star

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

sitemap crawler robot web-crawler distributed-crawler

Updated Jul 15, 2020
JavaScript

infinitbyte / gopa

Star

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

lightweight elasticsearch crawler spider web-crawler scraping crawling web-scraping web-spider

Updated Nov 24, 2019
Go

lucasxlu / LagouJob

Star

Job data mining repo for lagou.com

nlp machine-learning data-mining web-crawler python3 data-analysis lagou

Updated Apr 19, 2019
Python

rivermont / spidy

Star

The simple, easy to use command line web crawler.

python crawler web-crawler crawling python3 web-spider

Updated Jun 23, 2020
Python

microfisher / Strong-Web-Crawler

Star

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

crawler phantomjs web-crawler sellenium

Updated Oct 25, 2019
C#

ssssssss-team / spider-flow

Star

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

crawler spider web-crawler jsoup xpath webcrawler webspider web-spider spider-flow

Updated Jun 21, 2020
Java

antchfx / antch

Star

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

golang crawler framework web-crawler scraping crawling web-spider

Updated May 31, 2020
Go

elliotxx / zhihu-crawler-people

Star

A simple distributed crawler for zhihu && data analysis

python crawler spider web-crawler python-crawler web-spider

Updated Nov 11, 2019
Python

crawler-commons / crawler-commons

Star

A set of reusable Java components that implement functionality common to any web crawler

java open-source library web-crawler robots-txt sitemaps

Updated Aug 7, 2020
Java

duyet / awesome-web-scraper

Star

A collection of awesome web scaper, crawler.

php awesome spider storage phantomjs web-crawler web-scraper scrapy awesome-list goutte slimerjs

Updated Aug 5, 2020

Norconex / collector-http

Star

Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

java search-engine web-crawler norconex-http-collector

Updated Aug 4, 2020
Java

gusdnd852 / kochat

Star

Opensource Korean chatbot framework based on deep learning 💬

deep-learning web-crawler chatbot korean deeplearning sentence-classification korean-chatbot sequance-tagging

Updated Jul 9, 2020
Python

mazzzystar / Proxy

Star

A simple tool for fetching usable proxies from several websites.

web-crawler proxies proxypool proxy-list

Updated Jun 21, 2020
Python

abaykan / CrawlBox

Star

Easy way to brute-force web directory.

python crawler web-crawler wordlist admin-finder

Updated Jun 2, 2019
Python

commoncrawl / news-crawl

Star

News crawling with Storm-crawler - stores content as WARC

crawler news web-crawler apache-storm warc

Updated Jul 29, 2020
Java

brianmadden / krawler

Star

A web crawling framework written in Kotlin

kotlin link-checker framework web-crawler webcrawler web-crawling crawler4j

Updated Jun 13, 2020
Kotlin

platonai / pulsar

Star

Turn large Web sites into tables and charts using simple SQLs.

data-science web-crawler selenium web-scraping web-mining web-sql

Updated Aug 11, 2020
Java

crawlab-team / crawlab-lite

Star

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

platform crawler spider web-crawler scrapy scrapyd scrapy-ui scrapyd-ui crawling-tasks crawlab crawler-management

Updated Jul 20, 2020
Vue

DwarfThief / Raspagem-de-dados-para-iniciantes

Star

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

python opensource web-crawler jupyter-notebook scrapy spyder estudo datascraping webcrawling raspagem-de-dados

Updated Aug 7, 2020
Python

monkey-soft / SchweizerMesser

Star

🎯Python 3 网络爬虫实战、数据分析合集 | 当当 | 网易云音乐 | unsplash | 必胜客 | 猫眼 |

python spider web-crawler selenium python3

Updated Feb 24, 2020
HTML

jaxBCD / Ultimate-Dork

Star

Web Crawler

web-crawler sqli-vulnerability-scanner google-dorks dork web-crawler-python bing-search hacking-tools dorkscanner

Updated Mar 19, 2019
Python

CVPR2019

mattdeitke / CVPR2019

Star

Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.

python imagemagick computer-vision web-crawler lda web-crawler-python cvpr2019

Updated Jun 11, 2020
HTML

Improve this page

Add a description, image, and links to the web-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawler topic, visit your repo's landing page and select "manage topics."

You can’t perform that action at this time.