Kavita Ganesan's starred repositories
nlp-in-practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
robustness-gym
Robustness Gym is an evaluation toolkit for machine learning.
blog-articles
Curated List of Blog Posts From Opinosis Analytics
word_cloud
Python word cloud library for use within Jupyter notebook and Python apps.
tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
OpenNMT-py
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
keras-text-classification
CNN text classification using keras
awesome-machine-learning-on-source-code
Cool links & research papers related to Machine Learning applied to source code (MLonCode)
clinical-concepts
Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.
spark-examples
Examples of code in spark
nlp-cloud-apis
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
opinosis-summarization
This repo contains code and dataset for the Opinosis Summarization Framework
java-string-similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...