There are 3 repositories under nlproc topic.
🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
Augmenty is an augmentation library based on spaCy for augmenting texts.
Scrape data from social media and chat with it using Langchain
EMNLP 2019: Generating Personalized Recipes from Historical User Preferences
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
A Streamlit app to extract keywords using KeyBert
Text Anonymization app with Streamlit and Spacy
My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
How to build a multi-label sentiment classifiers with Tez and PyTorch
LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
Code and data of the ACL 2021 paper "Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution"
Leverage the power of the Google Natural Language API NLP to retrieve entity relationships from Wikipedia URLs or topics! Get interactive networkx graphs of connected entities!
Code base for the EMNLP 2021 paper, "Multi-granularity Textual Adversarial Attack with Behavior Cloning".
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)
Code for "CharManteau: Character Embedding Models For Portmanteau Creation. EMNLP 2017. Varun Gangal*, Harsh Jhamtani*, Graham Neubig, Eduard Hovy, Eric Nyberg"
MLOne Powered by AIEdX. Machine Learning Course for Everyone. Tier1 Basic
Code and Word2Vec embeddings of LOINC codes for KDD 2019 DSHealth paper "Evaluation of Embeddings of Laboratory Test Codes for Patients at a Cancer Center": https://arxiv.org/abs/1907.09600
Topic Modeling and Sentiment Analysis on Italo Svevo Epistolary Corpus
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
Some of My Codes for Natural Language Processing
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
An always-a-work-in-progress combination of documentation and demo notebooks for working with the LatinCy models
Comparing between residual stream and highway stream in transformers(BERT) .
pytorch implementation of the simple word embedding model.