COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200725101539/https://github.com/topics/datasets
Here are
815 public repositories
matching this topic...
A topic-centric list of HQ open datasets.
pix2code: Generating Code from a Graphical User Interface Screenshot
Updated
Mar 16, 2020
Python
An open source multi-tool for exploring and publishing data
Updated
Jul 25, 2020
Python
Open source text annotation tool for machine learning practitioner.
Updated
Jul 22, 2020
Python
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Updated
Jul 23, 2020
JavaScript
🤗 nlp – Datasets and evaluation metrics for Natural Language Processing in NumPy, Pandas, PyTorch and TensorFlow
Updated
Jul 25, 2020
Python
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Updated
Jul 25, 2020
Python
A curated list of awesome JSON datasets that don't require authentication.
Updated
Apr 29, 2020
JavaScript
AkShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Updated
Jul 24, 2020
Python
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Updated
Mar 1, 2020
Python
Datasets, tools, and benchmarks for representation learning of code.
Updated
Jul 22, 2020
Jupyter Notebook
Colour Science for Python
Updated
Jul 25, 2020
Python
Instant access to many datasets in Python.
Updated
May 9, 2017
Python
Updated
Mar 1, 2020
Python
✏️ Web-based image segmentation tool for object detection, localization and keypoints
In-memory tabular data in Julia
Updated
Jul 24, 2020
Julia
Large datasets for conversational AI
Updated
Nov 16, 2019
Python
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Updated
Jul 23, 2020
Python
Serverless integration and compute platform. Free for developers.
Updated
Jul 23, 2020
JavaScript
Open source audio annotation tool for humans™
Updated
Jul 17, 2020
JavaScript
Community list of transit APIs, apps, datasets, research, and software 🚌 🌟 🚋 🌟 🚂
A collections of public and free annotated datasets of relationships between entities/nominals (Portuguese and English)
Benchmark datasets, data loaders, and evaluators for graph machine learning
Updated
Jul 21, 2020
Python
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Updated
Sep 13, 2019
Jupyter Notebook
A large collection of system log datasets for AI-powered log analytics
A list of Twitter datasets and related resources.
A curated list of awesome links and software libraries that are useful for robots.
🔧 A curated list of awesome dataset tools
A collection of recent video understanding datasets, under construction!
Improve this page
Add a description, image, and links to the
datasets
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
datasets
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.