makrand12 / topic-modelling-challenge

Topic modelling of news articles

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Overview

Topic modelling as the name suggests, it is a process to automatically identify topics present in a text object and to derive hidden patterns exhibited by a text corpus. Thus, assisting better decision making.

Topic Modelling is different from rule-based text mining approaches that use regular expressions or dictionary based keyword searching techniques. It is an unsupervised approach used for finding and observing the bunch of words (called “topics”) in large clusters of texts.

Topics can be defined as “a repeating pattern of co-occurring terms in a corpus”. A good topic model should result in – “health”, “doctor”, “patient”, “hospital” for a topic – Healthcare, and “farm”, “crops”, “wheat” for a topic – “Farming”.

We have a dataset which consists of News articles and our task is to assign topics to those articles.

We will do a simple LSI and lastly a LDA method to figure out the topics

Learnings from the project

Why solve it Solving it will help you apply the following skills:

Topic Modelling

Understanding Topic classification

About

Topic modelling of news articles

License:MIT License


Languages

Language:Python 100.0%