devanshuThakar / Natural-Language-Processing

This repository contains the Assignments done for the CS 613 : Natural Language Processing course at IIT Gandhinagar during Semster-1 2021-22.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Natural Langugae Processing

This repository contains the Assignments done for the course CS 613 : Natural Language Processing course offerd at IIT Gandhinagar during Semster-1 2021-22.

Crawling Data

In this assignment data was scrapped from twitter using the twint API. Tweets related to India on the discussing about topics of Pollution, Climate Change, Eco Friendly and Flood were scrapped.

Word cloud for data for each topic (i.e. Pollution, Climate Change, Eco Friendly and Flood) was produced. The word cloud for pollution is shown.

alt-txt

Processing and Understanding Data

In this part a statistical analysis of the Data like frequency distribution of words, validiating the language annotation assigned by Twitter, fitting the Data with the Heap's Law.

According to Heap's Law, the size of vocabulary $|V|$ and number of tokens $N$ are related by the following expression : $$|V| = K N^{\beta}$$ where $K$ and $\beta$ parameters. The plot is shown below :

alt-txt

About

This repository contains the Assignments done for the CS 613 : Natural Language Processing course at IIT Gandhinagar during Semster-1 2021-22.


Languages

Language:Jupyter Notebook 80.0%Language:Python 20.0%