There are 9 repositories under text-cleaning topic.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
🧹 Python package for text cleaning
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
Text preprocessing tools in python.
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
A Python package to get useful information from documents using TopicRank Algorithm.
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
Korean text data preprocess toolkit for NLP
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
Common Text Pre-Processing for Portuguese
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker
Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.
A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can Clean and Analyze Your Text Data In One Line of Code.
Article title, authors, date and body extraction dataset.
Indonesian News and Article Clustering with K-Means++
Sentiment analysis, text mining, topic modeling & sentiment prediction
Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)
Sentiment Analysis of Restaurant Reviews using NLP
The code is a collection of NLP analyses, including text cleaning, most common words, n-grams generation, co-occurrence matrix generation, wordcloud generation, topic modeling (using Latent Dirichlet Allocation), and general text statistics.
Hotels play a crucial role in travelling and with the increased access to information new pathways of selecting the best ones emerged. With this model, you can explore what makes a great hotel and maybe even use this model in your trip planning.
Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy
12th place (top 4%) solution for Shopee Code League 2020 - Sentiment Analysis
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
Utility that automates spelling correction over batches of text files
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.