text-corpus

There are 2 repositories under text-corpus topic.

miras-tech / MirasText
MirasText
article corpus dataset irony-detection language-modeling nlp persian-nlp sentiment-analysis text-corpus word-embedding
Language:Python 73
Ermlab / PoLitBert
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
nlp polish roberta text-corpus
Language:Python 34
mrzjy / StarrailDialog
A project that extracts Honkai: Star Rail text corpus
conversation game honkai honkai-star-rail mihoyo multilanguage nlp npc text-corpus character rpg-game
Language:Python 25
t-systems-on-site-services-gmbh / german-wikipedia-text-corpus
This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
machine-learning nlp text-corpus
24
WING-NUS / nus-sms-corpus
This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The data collectors opportunistically collected as much metadata about the messages and their senders as possible, so as to enable different types of analyses. This corpus was collected by Tao Chen and Min-Yen Kan. If you use this data, please ensure the following paper is cited. For more details, please refer to Citation field. Tao Chen and Min-Yen Kan (2013). Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus. Language Resources and Evaluation, 47(2)(2013), pages 299-355. URL: https://link.springer.com/article/10.1007%2Fs10579-012-9197-9
nus short-message-service sms social-media text-corpus
22
AsoSoft / AsoSoft-Text-Corpus
AsoSoft Text Corpus is the first large scale text corpus for the Kurdish language.
kurdish-language-processing text-corpus natural-language-processing central-kurdish sorani corpus kurdish
19
nikitaeverywhere / edu-text-analysis-experiments
Statistical text analysis and semantic networks with Python
tf-idf sigma text-analysis text-analyzer semantic-networks sigma-analysis analysis text-corpus gephi
Language:Python 14
lucylow / Yeezy-Taught-Me
Yeezy Taught Me Text Generation. Training next character predictions RNN LSTM model with user input text corpus
lstm-models lstm-cells recurrent-neural-networks rnn text-corpus time-series time-series-analysis time-series-classification time-series-prediction speech-recognition text-classification text-processing tenorflow neural-network lstm character-prediction character-generator corpus indexdb
Language:JavaScript 10
jonsafari / habeas-corpus
Command-line corpus tools
command-line-tools corpora corpus corpus-linguistics text-corpus vocabulary
Language:Shell 9
jcrippen / tlingit-corpus
Text corpus the of Tlingit language for linguistic research.
text-corpus linguistic-corpora linguistics-databases native-american indigenous-languages
Language:Shell 8
JuliusBahr / SimpleSimilarity
A framework for semantic text search
ios-framework swift natural-language-processing text-corpus text-search text-processing search search-engine search-algorithm help-wanted macos nlp ios corpus-creation textual-search
Language:Swift 8
appeler / search_names
Search a long list of names (patterns) in a large text corpus systematically and quickly
names search text-corpus
Language:Python 7
luonglearnstocode / Seinfeld-text-corpus
text corpus :page_with_curl: scraped from the scripts :speech_balloon: of all Seinfeld episodes
web-scraping beautifulsoup4 text-corpus requests seinfeld regex
Language:Jupyter Notebook 7
thecsw / katya-dev
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
text-corpus tagger corpus corpus-linguistics corpus-processing corpus-builder corpus-generator corpus-analysis russian russian-literature
Language:Go 6
capetocape / crawl-text-title-as-corpus
Crawling data from websites as text corpus
python crawling nlp text-corpus
Language:Python 2
Chandra-cc / Tesseract_ICR-Sheets
A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.
tesseract-ocr tesseract-icr-sheets tesseract lstm text-corpus google-handwritten-fonts recognize-icr-sheets
Language:Python 2
cligs / conha19
Corpus de novelas hispanoamericanas del siglo XIX (conha19)
tei xml digital-humanities text-corpus genre novels 19th-century spanish mexico argentina cuba linguistic-annotation
Language:XSLT 2
kurpicz / tcc
Text Corpus Collection
downloader text-corpus
Language:C++ 2
DroppedText_Corpus
seanpm2001 / DroppedText_Corpus
A text corpus collection for the DroppedText language.
droppedtext dropped-text droppedtext-lang dropped-text-lang corpus text-corpus gpl3 gplv3 drotex txt md markup markup-language collection droppedtext-corpus markuplanguage
2
soumyadeepghoshGG / Twitter-Sentiment-Analysis-with-NLP
Using natural language processing techniques to determine the sentiment expressed in a tweet, classified as positive or negative.
natural-language-processing nltk sentiment-analysis social-media text-corpus twitter
Language:Jupyter Notebook 2
TextCorpusLabs / wikimedia
Walk through to convert WikiMedia into a text corpus
wikimedia python3 text-corpus
Language:Python 2
alexlilia / igc-corpus-reader
This is a tool which can be used to index and query a large XML-based text corpus using Elasticsearch.
corpus corpus-linguistics text-corpus corpus-tool icelandic-language
Language:Python 1
hari8github / NLP
Sentiment analysis models using NLP and other important basics of NLP and subwords and a song lyric generator!
nlp-machine-learning pad-sequences tokenizer sentiment-classification text-corpus subword-embeddings gru lstm-sentiment-analysis lyrics-generator
Language:Jupyter Notebook 0
jdave23 / EAD-corpus
A collection of encoded archival description XML documents for text and content analysis.
archives corpus ead finding-aids text-corpus
Language:Shell 0
RedditEpidemicAnalysis / data
Data collection scripts for analysis of Reddit
text-corpus
0
s-bose7 / ngram-viewer
Exploring the history of word usage in English texts with a weighted popularity history plot.
n-grams popularity-analysis text-corpus
Language:Java 0
skyisveryblue1 / corpus-filter
Simple utility to filter out text corpus according to frequencies of words consisting sentences in it
corpus-processing cplusplus text-corpus
Language:C++ 0
TextCorpusLabs / congressional-votes
Walk through to convert congressional roll call votes into a text corpus
congress-votes text-corpus python3 us-congress
Language:Python 0
TextCorpusLabs / covid19
Walk through to convert Kaggle's COVID-19 Open Research Dataset Challenge into a text corpus
covid-19 python3 text-corpus
Language:Python 0
TextCorpusLabs / NJGovNews
Web scraping of the New Jersey news feeds
python3 text-corpus newsfeed
Language:Python 0
TextCorpusLabs / oas
Walk through to convert PMC OAS Dataset into a text corpus
oas python3 text-corpus
Language:Python 0
WHOSpeeches / WHODataHub
Collect the WHO's Director General's speeches.
python3 text-corpus who
Language:Python 0
AbdullahButt2611 / TextAnalyzer
"Text Analyzer" is a web application designed to analyze any given text or script and provide users with useful information about its contents.
text-analyzer textanalyser css html javascript online-service online-tools segment text-analysis text-classification text-corpus web web-application text-pars textutils word-frequencies
Language:HTML
alla-g / NLP2020
Final project for Natural language processing course in final_project_diary folder
mystem pymorphy2 selenium text-corpus
Language:Jupyter Notebook
motazsaad / corpus-expander
Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.
corpus-linguistics named-entities language-model expanding-sentences corpus-expander nes sentence text-corpus arabic-nlp
Language:Python

text-corpus

miras-tech / MirasText

Ermlab / PoLitBert

mrzjy / StarrailDialog

t-systems-on-site-services-gmbh / german-wikipedia-text-corpus

WING-NUS / nus-sms-corpus

AsoSoft / AsoSoft-Text-Corpus

nikitaeverywhere / edu-text-analysis-experiments

lucylow / Yeezy-Taught-Me

jonsafari / habeas-corpus

jcrippen / tlingit-corpus

JuliusBahr / SimpleSimilarity

appeler / search_names

luonglearnstocode / Seinfeld-text-corpus

thecsw / katya-dev

capetocape / crawl-text-title-as-corpus

Chandra-cc / Tesseract_ICR-Sheets

cligs / conha19

kurpicz / tcc

seanpm2001 / DroppedText_Corpus

soumyadeepghoshGG / Twitter-Sentiment-Analysis-with-NLP

TextCorpusLabs / wikimedia

alexlilia / igc-corpus-reader

hari8github / NLP

jdave23 / EAD-corpus

RedditEpidemicAnalysis / data

s-bose7 / ngram-viewer

skyisveryblue1 / corpus-filter

TextCorpusLabs / congressional-votes

TextCorpusLabs / covid19

TextCorpusLabs / NJGovNews

TextCorpusLabs / oas

WHOSpeeches / WHODataHub

AbdullahButt2611 / TextAnalyzer

alla-g / NLP2020

motazsaad / corpus-expander