kavgan

Kavita Ganesan's repositories

nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Language:Jupyter Notebook1143 51 8

ROUGE-2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Language:JavaApache-2.0209 10 25

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Language:Python125 7 4

opinosis-summarization

This repo contains code and dataset for the Opinosis Summarization Framework

Apache-2.051 40

word_cloud

Python word cloud library for use within Jupyter notebook and Python apps.

Language:Jupyter Notebook47 2 1

OpinRank

OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)

41 80

clinical-concepts

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.

GPL-3.025 40

spark-examples

Examples of code in spark

Language:Python10 10

stop-words

Stop word lists

6 20

text-mining-and-nlp-apis

APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.

5 30

hashtags_test

Test hashtags

3 10

Micropinion-Generation-Dataset

Dataset for Micropinion Generation. Dataset is based on user reviews from CNET. The reviews are on products from various categories like tv, cell phones, gps etc.

2 10

bootstrap

The most popular HTML, CSS, and JavaScript framework for developing responsive, mobile first projects on the web.

Language:JavaScriptMIT1 10

data-science-blogs

A curated list of data science blogs

Language:Python1 10

pyrxnlp

Super simple NLP tools. Cluster sentences, get multiple text similarity measures including cosine, jaccard and dice, generate topics, extract text from html and more

Language:PythonLGPL-3.01 10