The Wayback Machine - https://web.archive.org/web/20200812013855/https://github.com/topics/corpus

#

corpus

Here are 497 public repositories matching this topic...

brightmart / nlp_chinese_corpus

Star

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

nlp news wiki text-classification word2vec corpus dataset question-answering chinese chinese-nlp language-model bert chinese-corpus pretrain chinese-dataset

Updated Dec 1, 2019

dariusk / corpora

Star

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

language bots corpus words

Updated Aug 4, 2020
JavaScript

wainshine / Chinese-Names-Corpus

Star

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

corpus names dataset dict ner

Updated Mar 30, 2020

endymecy / awesome-deeplearning-resources

Star

Deep Learning and deep reinforcement learning research papers and some codes

nlp video reinforcement-learning deep-learning neural-network code paper corpus modelzoo

Updated Jun 26, 2020

jinfagang / weibo_terminater

Star

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

scraper chatbot corpus chinese weibo sina

Updated Oct 25, 2019
Python

fendouai / Awesome-Chatbot

Star

Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:

awesome tutorial tensorflow chatbot corpus seq2seq seq2seq-model seq2seq-chatbot

Updated Feb 10, 2020
Python

candlewill / Dialog_Corpus

Star

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

system chatbot dialog corpus dataset

Updated Nov 13, 2017
Python

CLUEbenchmark / CLUE

Star

中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

benchmark tensorflow nlu glue corpus transformers pytorch dataset chinese pretrained-models language-model albert bert roberta chineseglue

Updated Jul 15, 2020
Python

CLUEbenchmark / CLUEDatasetSearch

Star

搜索所有中文NLP数据集，附常用英文NLP数据集

nlp qa sentiment-analysis text-classification match machine-translation text-similarity corpus knowledge-graph chinese text-summarization datasets ner machine-reading-comprehension

Updated Mar 1, 2020
Python

gunthercox / chatterbot-corpus

Star

A multilingual dialog corpus

language yaml dialog corpus chatterbot

Updated Aug 9, 2020
Python

chatopera / insuranceqa-corpus-zh

Star

OpenData in insurance area for Machine Learning Tasks, 保险行业语料库

machine-learning natural-language-processing insurance chatbot corpus dataset question-answering natural-language-understanding qasystem insuranceqa-corpus-zh

Updated Jul 13, 2018
Python

tensorlayer / seq2seq-chatbot

Star

Chatbot in 200 lines of code using TensorLayer

python nlp chat bot tensorflow chatbot corpus lstm rnn tensorlayer

Updated Oct 6, 2019
Python

wainshine / Company-Names-Corpus

Star

公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。

company corpus dataset dict ner

Updated Mar 30, 2020

quanteda / quanteda

Star

An R package for the Quantitative Analysis of Textual Data

natural-language-processing r corpus text-analytics quanteda

Updated Aug 11, 2020
R

crownpku / Small-Chinese-Corpus

Star

Some useful Chinese corpus datasets 中文语料小数据

corpus chinese-nlp

Updated Mar 29, 2020

nonamestreet / weixin_public_corpus

Star

微信公众号语料库

nlp natural-language-processing corpus linguistics weixin chinese-nlp corpora weixin-data wei-xin yu-liao yu-liao-ku

Updated Jan 7, 2019

mhbashari / awesome-persian-nlp-ir

Star

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

natural-language-processing information-retrieval corpus language-detection embeddings named-entity-recognition normalizer spell-check persian-language stemmer dependency-parser persian-nlp part-of-speech-tagger morphological-analysis persian-stemmer shallow-parser

Updated Aug 5, 2020

MozillaSecurity / fuzzdata

Star

Fuzzing resources for feeding various fuzzers with input. 🔧

firefox settings browser corpus seeds fuzzing corpora

Updated Apr 28, 2020
HTML

CLUEbenchmark / CLUEPretrainedModels

Star

高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型

text-classification corpus dataset chinese semantic-similarity pretrained-models sentence-classification albert bert sentence-analysis distillation sentence-pairs roberta

Updated Jul 8, 2020
Python

BLKSerene / Wordless

Star

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

nlp language translation corpus literature corpus-linguistics corpus-tools multi-language-support corpus-processing

Updated Aug 9, 2020
Python

OYE93 / Chinese-NLP-Corpus

Star

Collections of Chinese NLP corpus

corpus chinese-nlp datasets

Updated Jul 21, 2020
Python

soskek / bookcorpus

Star

Crawl BookCorpus

nlp crawler scraper corpus bookcorpus

Updated May 20, 2020
Python

several27 / FakeNewsCorpus

Star

A dataset of millions of news articles scraped from a curated list of data sources.

nlp machine-learning natural-language-processing database corpus artificial-intelligence dataset fakenews

Updated Jan 25, 2020

CLUEbenchmark / CLUECorpus2020

Star

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

nlp corpus chinese datasets albert bert chinese-corpus roberta pretrain

Updated Mar 18, 2020

lil-lab / nlvr

Star

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

machine-learning natural-language-processing computer-vision corpus

Updated Sep 11, 2019
HTML

yohasebe / wp2txt

Star

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

ruby nlp wikipedia corpus wikipedia-dump

Updated Jan 10, 2018
Ruby

EdinburghNLP / code-docstring-corpus

Star

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

corpus code-generation neural-machine-translation documentation-generator docstrings

Updated Jul 13, 2020
Python

zake7749 / Gossiping-Chinese-Corpus

Star

PTT 八卦版問答中文語料

chatbot dialog corpus dataset question-answering chinese-nlp ptt chinese-corpus chinese-chatbot chinese-dataset chatbot-corpus

Updated Sep 9, 2019
Jupyter Notebook

prosody

Helsinki-NLP / prosody

Star

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

machine-learning natural-language-processing corpus pytorch speech-synthesis dataset prosody bert sequence-labeling

Updated Oct 30, 2019
Python

chatopera / efaqa-corpus-zh

Star

❤️Emotional First Aid Dataset, 心理咨询问答语料库

natural-language-processing corpus psychology natural-language-understanding

Updated Jun 17, 2020
Python

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."

You can’t perform that action at this time.