kavgan

Kavita Ganesan's starred repositories

nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Language:Jupyter Notebook112400

robustness-gym

Robustness Gym is an evaluation toolkit for machine learning.

Language:PythonApache-2.043900

blog-articles

Curated List of Blog Posts From Opinosis Analytics

200

word_cloud

Python word cloud library for use within Jupyter notebook and Python apps.

Language:Jupyter Notebook4700

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonApache-2.01498500

OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Language:PythonMIT661600

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Language:Python12500

Aspect-Based-Sentiment-Analysis

Language:Jupyter Notebook2900

lstm-on-Yelp-review-data

Language:Jupyter Notebook4800

workshops

A few exercises for use at events.

Language:Jupyter NotebookApache-2.0145500

keras-text-classification

CNN text classification using keras

Language:Python1500

ROUGE-2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Language:JavaApache-2.020600

awesome-machine-learning-on-source-code

Cool links & research papers related to Machine Learning applied to source code (MLonCode)

CC-BY-SA-4.0615400

clinical-concepts

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.

GPL-3.02500

OpinRank

OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)

4000

Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Language:JavaNOASSERTION96600

spark-examples

Examples of code in spark

Language:Python1000

nlp-cloud-apis

RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.

1500

opinosis-summarization

This repo contains code and dataset for the Opinosis Summarization Framework

Apache-2.05100

java-string-similarity

Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

Language:JavaNOASSERTION266600