A2Amir/Topic-Modeling

Introduction

In this repository, I'll use the gensim library to build LDA (Latent Dirichlet Allocation) to classify text in a document to a particular topic. The dataset I'll use is a list of over one million news headlines published over a period of 15 years. We'll start by loading it from the abcnews-date-text.csv file.

Latent Dirichlet Allocation

To learn more about what the Latent Dirichlet Allocation is and how it works, first watch the videos linked below:

Getting Started

you can download a copy of the project from my GitHub here and then run a Jupyter server locally with Anaconda.

Open a terminal and clone the project repository:

$ git clone https://github.com/A2Amir/Topic-Modeling

Switch to the project folder you cloned the project and create a conda environment (note: you must already have Anaconda installed):

$ cd to the project folder you cloned the project
$ conda env create -f nlp.yaml

Activate the conda environment, then run the jupyter notebook server. (Note: windows users should run activate nlp)

 $ source activate nlp
 $ jupyter notebook

Depending on your system settings, Jupyter will either open a browser window, or the terminal will print a URL with a security token. If the terminal prints a URL, simply copy the URL and paste it into a browser window to load the Jupyter browser. Once you load the Jupyter browser, select the project notebook Latent_dirichlet_allocation and follow the instructions inside to run it.

Resource

Latent dirichlet allocation

About

In this repository, I'll use the gensim library to build LDA (Latent Dirichlet Allocation) to classify text in a document to a particular topic.

Languages

Language:Jupyter Notebook 100.0%