Kavita Ganesan's repositories
nlp-in-practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
opinosis-summarization
This repo contains code and dataset for the Opinosis Summarization Framework
word_cloud
Python word cloud library for use within Jupyter notebook and Python apps.
clinical-concepts
Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.
spark-examples
Examples of code in spark
stop-words
Stop word lists
text-mining-and-nlp-apis
APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
hashtags_test
Test hashtags
Micropinion-Generation-Dataset
Dataset for Micropinion Generation. Dataset is based on user reviews from CNET. The reviews are on products from various categories like tv, cell phones, gps etc.
data-science-blogs
A curated list of data science blogs
python-examples
Working examples in python
ROUGE-Utility
Utility tools to prepare and evaluate ROUGE scores. Perl script to convert perl output of ROUGE to CSV.
SIF_mini_demo
minimal example for sentence embedding by Smooth Inverse Frequency weighting scheme
spark-lucenerdd
Spark RDD with Lucene's query capabilities