oskar-j / awesome-text-ml

A curated list of ML awesome frameworks & libraries for text data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome software for Text ML Awesome

A curated list of awesome ML frameworks and text embeddings. Focused on SOTA libraries which are actively maintained on GitHub.

Frameworks and libraries

๐Ÿ Python

Text processing

  • HanLP - Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification via one unified interface. https://bbs.hankcs.com/

  • flair - A powerful NLP library for state-of-the-art natural language processing (NLP) models, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification.

  • sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.

  • stanza - Official Stanford NLP Python Library for Many Human Languages. https://stanfordnlp.github.io/stanza/

Pipelines / block-programming

Distributed computing

Machine Learning

  • sklearn - Scikit-learn is a Python module for machine learning built on top of SciPy, including tools for text vectorization and vector space compression. https://scikit-learn.org/stable/

  • gensim - Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. https://radimrehurek.com/gensim/

  • nlpaug - Augmenting nlp for your machine learning projects.

  • AugLy - A data augmentations library from Facebook research for audio, image, text, and video.

Deep Learning

Natural Language Understanding

Text mining

  • dedupe - A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Visualizations

  • Scattertext - Beautiful visualizations of how language differs among document types.

Big language models

  • BIG-bench - Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.

C++

Text processing

Currently empty ๐Ÿชน

Knowledge ๐Ÿ“š

Learning 101

  • Virgilio - Virgilio is an open-source initiative, aiming to mentor and guide anyone in the world of the Data Science.

Multiple languages

Python (and Python Notebooks)

  • practicalAI - A practical approach to machine learning to enable everyone to learn, explore and build. https://practicalai.me

  • nlp-recipes - Comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems.

No longer maintained

About

A curated list of ML awesome frameworks & libraries for text data

License:Creative Commons Zero v1.0 Universal