DanK's starred repositories
Enron2mbox
Converting the Enron email collection to mbox format
Topic-modelling-using-LDA
The Enron database is analysed using Latent Dirichlet allocation.
EnronTopicModelling
Topic Modelling the Enron Emails
enron-nlp-mining
Text analysis on Enron emails data
TopicModelComparison
Scripts and codes for replicating experiments published in Exploring Topic Coherence over many models and many topics
wayward
Wayward is a Python package that helps to identify characteristic terms from single documents or groups of documents. It can be used for keyword extraction and several related tasks, and can create efficient sparse representations for classifiers. It was originally created to provide term weights for word clouds.
financial-news-data
Construct a structured DataFrame from the Reuters news corpus
mailinator-box
📬 Stream public mailinator emails .
disposable-emails.github.io
The complete list of disposable email domains
topic-modeling-textPrep
text preprocessing library for topic models
Collab-Rdp
Use google Collab as a temporary server, with an rdp.
harry_potter_nlp
Harry Potter and the Allocation of Dirichlet
topic-modelling
Handy Jupyter Notebooks that I use in for Topic Modeling. Including text mining from PDF files, text preprocessing, Latent Dirichlet Allocation (LDA), hyperparameters grid search and Topic Modeling visualiation.
Colab-Hacks
Simple Hacks for Google Colaboratory to boost your productivity and help you to perform daily tasks.
Smart-Literature-Review
This is the repository for the files and documents used in the Smart Literature Review paper from (Boye, Møller, 2019)
mirrors-china
Mirrors and registries in Mainland China
Top2Vec-Demo
Demo on Top2Vec to generate topics using BERT model
Chapter-18-Topic-Modeling
Chapter 18: Topic Modeling
topic-model-tutorial
Tutorial on topic models in Python with scikit-learn
open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)