Abhishekmamidi123 / Natural-Language-Processing

Language Modelling, CMI vs Perplexity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Natural Language Processing

Dataset used: Twitter codemix data.

  1. Language Modelling:
  • Calculated Trigram, Bigram, Unigram perplexities on codemix data.
  1. CMI vs Perplexity:
  • Calculated Code Mixing Index(CMI) for each tweet and seperated tweets into 10 sets based on the CMI values. For each set we found perplexity, and found the relation between CMI and Perplexity on the data we collected.

  • Each folder has README.md inside describing what we have done.

Contributors:

M R Abhishek and K Vagdevi

About

Language Modelling, CMI vs Perplexity


Languages

Language:Python 100.0%