text-corpus

Star

Here are 15 public repositories matching this topic...

miras-tech / MirasText

Star

MirasText

nlp sentiment-analysis article corpus language-modeling dataset persian-nlp text-corpus word-embedding irony-detection

Updated Aug 12, 2020
Python

WING-NUS / nus-sms-corpus

Star

This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The data collectors opportunistically collected as much metadata about the messages and their senders as possible, so as to enable different types of analyses. This corpus was collected by Tao Chen and Min-Yen Kan. If you use this data, please ensure the following paper is cited. For more details, please refer to Citation field. Tao Chen and Min-Yen Kan (2013). Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus. Language Resources and Evaluation, 47(2)(2013), pages 299-355. URL: https://link.springer.com/article/10.1007%2Fs10579-012-9197-9

social-media sms nus text-corpus short-message-service

Updated Aug 9, 2017

jonsafari / habeas-corpus

Star

Command-line corpus tools

vocabulary corpus corpora corpus-linguistics command-line-tools text-corpus

Updated May 15, 2017
Shell

t-systems-on-site-services-gmbh / german-wikipedia-text-corpus

Star

This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.

nlp machine-learning text-corpus

Updated Jul 12, 2019

ZitRos / edu-text-analysis-experiments

Star

Statistical text analysis and semantic networks with Python

analysis text-analysis tf-idf gephi semantic-networks sigma text-corpus text-analyzer sigma-analysis

Updated Nov 30, 2017
Python

appeler / search-names

Star

Search a long list of names (patterns) in a large text corpus systematically and quickly

bill text-corpus william famous-people

Updated May 30, 2020
Python

JuliusBahr / SimpleSimilarity

Star

A framework for semantic text search

search nlp macos swift search-engine ios natural-language-processing help-wanted search-algorithm text-processing ios-framework text-corpus text-search corpus-creation textual-search

Updated Aug 7, 2019
Swift

lucylow / Yeezy-Taught-Me

Star

Yeezy Taught Me Text Generation. Training next character predictions using a RNN LSTM model based on patterns in text corpus with Tensorflow JavaScript

time-series neural-network text-classification corpus recurrent-neural-networks lstm speech-recognition rnn text-processing character-generator text-corpus time-series-analysis indexdb lstm-cells time-series-classification time-series-prediction character-prediction tenorflow lstm-models

Updated Sep 10, 2020
JavaScript

kurpicz / tcc

Star

Text Corpus Collection

downloader text-corpus

Updated Jul 24, 2019
C++

jcrippen / tlingit-corpus

Star

Text corpus the of Tlingit language for linguistic research.

indigenous-languages text-corpus linguistic-corpora native-american linguistics-databases

Updated Jun 17, 2020
Shell

capetocape / crawl-text-title-as-corpus

Star

Crawling data from websites as text corpus

python nlp crawling text-corpus

Updated Sep 8, 2018
Python

luonglearnstocode / Seinfeld-text-corpus

Star

text corpus 📃 scraped from the scripts 💬 of all Seinfeld episodes

regex requests web-scraping seinfeld text-corpus beautifulsoup4

Updated Jan 8, 2019
Jupyter Notebook

TextCorpusLabs / wikimedia-to-textcorpus

Star

Walk through to convert WikiMedia into a text corpus

wikimedia python3 text-corpus

Updated Sep 10, 2020
Python

motazsaad / corpus-expander

Star

Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.

nes named-entities sentence corpus-linguistics language-model text-corpus arabic-nlp expanding-sentences corpus-expander

Updated Apr 20, 2018
Python

Chandra-cc / Tesseract_ICR-Sheets

Star

A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.

tesseract lstm tesseract-ocr text-corpus tesseract-icr-sheets google-handwritten-fonts recognize-icr-sheets

Updated Aug 4, 2018
Python

Improve this page

Add a description, image, and links to the text-corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-corpus topic, visit your repo's landing page and select "manage topics."

Learn more

Aug	SEP	Oct
	19
2019	2020	2021