nlp-datasets

Rather than the current system of each sub-corpora it is own folder with its own code. Create a top-level downloads.sh which can re-assemble the sub-corpora.

Separately, have the downloaded & pre-processed sub-corpora ready to be referenced from ADR, and NMT repos as submodules etc.

May	JUN	Jul
	17
2019	2020	2021

nlp-datasets

Here are 61 public repositories matching this topic...

mihail911 / nlp-library

dkulagin / kartaslov

zonetrooper32 / VDCNN

quincyliang / nlp-public-dataset

hellohaptik / multi-task-NLP

INK-USC / TriggerNER

chiphuyen / MetroTwitter

kelvin-jiang / FreebaseQA

Niger-Volta-LTI / yoruba-text

Provide a script to cleanly download and normalize text

[CLEANUP] unclean text

INK-USC / CommonGen

selimfirat / bilkent-turkish-writings-dataset

gcunhase / AMICorpusXML

uma-pi1 / OPIEC

cyrilou242 / RapLyrics-Back

maxent-ai / Datasets

SemiringInc / Mueller-Report-Corpus

uma-pi1 / OPIEC-pipeline

mtala3t / Identify-the-Sentiments-AV-NLP-Contest

utahnlp / infotabs-code

irfnrdh / Awesome-Indonesia-NLP

marco-roberti / pytorch-e2e-dataset

mnschmit / SherLIiC

navneetkrc / Flair_SOTA_NLP

cybermatt / russian-names

ElizaLo / Question-Answering-based-on-SQuAD

jrgpulido / js19is2e

jamesohortle / loanwords_gairaigo

gcunhase / ArXivAbsTitleDataset

vgupta123 / infotabs-code

jrgpulido / pd18is5d

Improve this page

Add this topic to your repo