text-data

There are 1 repository under text-data topic.

asyml / texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
machine-learning natural-language-processing tensorflow deep-learning text-generation python machine-translation dialog-systems texar bert gpt-2 xlnet text-data data-processing casl-project
Language:Python 2381
microsoft / DialoGPT
Large-scale pretraining for dialogue
data-processing dialogpt dialogue gpt-2 machine-learning pytorch text-data text-generation transformer
Language:Python 2320
microsoft / GODEL
Large-scale pretrained models for goal-directed dialog
data-processing dialogue dialogue-systems machine-learning text-data text-generation transformers conversational-ai language-grounding grounded-generation dialogpt language-model pretrained-model pytorch transformer
Language:Python 835
asyml / texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
machine-learning natural-language-processing pytorch deep-learning text-generation python machine-translation dialog-systems texar bert gpt-2 xlnet roberta text-data data-processing texar-pytorch casl-project
Language:Python 741
asyml / forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
data-processing deep-learning information-retrieval machine-learning natural-language natural-language-processing pipeline python text-data
Language:Python 235
thu-coai / cotk
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
machine-learning natural-language-processing natural-language-generation deep-learning python data-processing text-data cotk metrics
Language:Python 128
LoLei / redditcleaner
Cleans Reddit Text Data :scroll: :broom:
reddit pushshift data-cleaning text-data nlp python praw psaw hacktoberfest
Language:Python 78
trinker / textreadr
Tools to uniformly read in text data including semi-structured transcripts
doc docx pdf-reading r read-transcripts text-data text-mining
Language:R 72
trinker / textshape
Tools for reshaping text data
text-data data-reshaping tidy text-formating manipulation r sentence-boundary-detection
Language:R 47
BALaka-18 / rake_new2
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
keyword-extraction keyword-search keywords nlp python-library text text-data
Language:Python 29
PratikBarhate / question-classification
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
python3 nlp machine-learning question-classification spacy experimental text-data pytorch neural-network
Language:Python 28
YaleDHLab / wordmap
Visualize large text collections with WebGL
webgl data-visualization word2vec text-data nlp
Language:JavaScript 24
carted / processing-text-data
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
tensorflow apache-beam dataflow text-data bert use-bert tfhub
Language:Python 20
PedroBarcha / old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
ocr-database text text-database old-books old-documents ground-truth dataset books-dataset ocr-dataset text-data binarization binarized-dataset groundtruth
Language:HTML 11
tayebiarasteh / retweet
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
nlp natural-language-processing natural-language lstm bidirectional-lstm sentiment-analysis sentiment-polarity pytorch deep-learning deep-neural-networks manual-annotations tweet tweeter tweet-replies tweepy tweet-analysis tweet-data text-classification text-data unsupervised-learning
Language:Python 11
tylerjthomas9 / ScrapeSEC.jl
Scrape EDGAR filings from https://www.sec.gov/
scraper edgar financial-data sec julia text-data finance
Language:Julia 11
Hsankesara / The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
nlp tweets text-datasets text-data tweepy
Language:Jupyter Notebook 9
mrchypark / gomSubtitleData
곰tv 자막 데이터 수집 코드
r data subtitles korean text text-data movies drama
Language:R 6
saghiles / dcc
Directional Co-clustering with a Conscience (DCC)
co-clustering clustering von-mises-fisher directional-statistics mixture-model topic-modeling text-clustering text-data
Language:R 3
SignalN / parallelio
For reading from and writing to parallel data files in Python
natural-language-processing machine-learning text text-data preprocessing pre-processing
Language:Python 3
Allan-Cao / lol-voice-lines
Dataset of League of Legends Voice Lines
dataset league-of-legends text-data
Language:Jupyter Notebook 2
Ankit152 / StackOverflow-Tag-Prediction
A machine learning model that predicts tags for a given question and body.
stackoverflow tag-prediction machine-learning onevsrestclassifier hamming-loss micro-f1score nlp stemming text-mining text-data sgd-classifier tags count-vectorizer tfidf-vectorizer
Language:Jupyter Notebook 2
ccubc / GlassdoorReviews
classifying employee reviews on glassdoor.com
big-data lda text-data nlp
Language:Jupyter Notebook 2
jfjelstul / regular-expressions-tutorial
A tutorial on using regular expressions in R
r regular-expressions stringr text-analysis text-as-data text-data tidyverse tutorial
2
PriyankaSett / predicting_instagram_likes
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
decision-tree-regression knn-regression lasso-regression linear-regression nltk pandas python random-forest-regression regression-analysis seaborn text-data tf-idf wordninja
Language:Jupyter Notebook 2
bchryzal / Detecting-Generated-Scientific-Papers
Can you spot automatically generated scientific excerpts?
classification deberta deep-learning keras nlp tensorflow2 text-classification text-data transformers
Language:Jupyter Notebook 1
cauchi94 / airbnb-customer-sentiment
Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.
housing latent-dirichlet-allocation linear-regression natural-language-processing scikit-learn text-data topic-modeling wordcloud customer-sentiment
Language:Jupyter Notebook 1
Fake_news_content_detection_using_Sentence_Transformers
chandrashekhar1227-ML / Fake_news_content_detection_using_Sentence_Transformers
Rank 3/85 MachineHack
text-data nlp sentence-transformers logloss
Language:Jupyter Notebook 1
chandrashekhar1227-ML / Git_hub_bugs_prediction_using_Keras_BERT
Rank 16/98 MachineHack
text-data nlp bert-model keras-tensorflow accuracy-metrics
Language:Jupyter Notebook 1
ptthanh02 / VN_NewsCrawler
crawler crawling-python newspaper text-data text-mining
Language:Jupyter Notebook 1
vraul92 / NLP-on-Whatsapp-Group-Chat
Applying NLP techniques on WhatsApp text to gain insights.
natural-language-processing text-data data-mining-python lstm eda python regular-expressions
Language:Jupyter Notebook 1
Nexdata-AI / 13-Modules-Entity-Name-Single-sentence-Annotation-Data
13-Modules-Entity-Name-Single-sentence-Annotation-Data
entity-name-recognition nlp text-data
Nexdata-AI / 13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
text-data human-machine-interaction nlp
Nexdata-AI / 28237-Intent-type-single-sentence-annotation-data
28237-Intent-type-single-sentence-annotation-data
single-sentence text-data intent-detection nlp
Nexdata-AI / 80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data
80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data
text-data large-language-models llms nlp unsupervised-learning
Nexdata-AI / 8178-Chinese-Social-Comments-Events-Annotation-Data
8178-Chinese-Social-Comments-Events-Annotation-Data
chinese-social-comments-events nlp nlu text-data

text-data

asyml / texar

microsoft / DialoGPT

microsoft / GODEL

asyml / texar-pytorch

asyml / forte

thu-coai / cotk

LoLei / redditcleaner

trinker / textreadr

trinker / textshape

BALaka-18 / rake_new2

PratikBarhate / question-classification

YaleDHLab / wordmap

carted / processing-text-data

PedroBarcha / old-books-dataset

tayebiarasteh / retweet

tylerjthomas9 / ScrapeSEC.jl

Hsankesara / The-Tweets-of-Wisdom

mrchypark / gomSubtitleData

saghiles / dcc

SignalN / parallelio

Allan-Cao / lol-voice-lines

Ankit152 / StackOverflow-Tag-Prediction

ccubc / GlassdoorReviews

jfjelstul / regular-expressions-tutorial

PriyankaSett / predicting_instagram_likes

bchryzal / Detecting-Generated-Scientific-Papers

cauchi94 / airbnb-customer-sentiment

chandrashekhar1227-ML / Fake_news_content_detection_using_Sentence_Transformers

chandrashekhar1227-ML / Git_hub_bugs_prediction_using_Keras_BERT

ptthanh02 / VN_NewsCrawler

vraul92 / NLP-on-Whatsapp-Group-Chat

Nexdata-AI / 13-Modules-Entity-Name-Single-sentence-Annotation-Data

Nexdata-AI / 13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data

Nexdata-AI / 28237-Intent-type-single-sentence-annotation-data

Nexdata-AI / 80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data

Nexdata-AI / 8178-Chinese-Social-Comments-Events-Annotation-Data