Talia178 / NLP_TopicModelling_LDA

In this project, I'll visualize text data using WordCloud, employ the LDA model for topic modeling, and compute coherence scores to assess the model's quality and find the optimal number of topics. I'll create an interactive visualization with pyLDAvis, saving it as an HTML link for exploration.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About Dataset

"Friends" is an American television sitcom, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast starring Jennifer Aniston, Courteney Cox, Lisa Kudrow, Matt LeBlanc, Matthew Perry and David Schwimmer, the show revolves around six friends in their 20s and 30s who live in Manhattan, New York City. The series was produced by Bright/Kauffman/Crane Productions, in association with Warner Bros. Television. The original executive producers were Kevin S. Bright, Kauffman, and Crane.

Kaggle link: https://www.kaggle.com/datasets/sujaykapadnis/friends/data?select=friends.csv

friends.csv variables:

  • text: Dialogue as text
  • speaker: Name of the speaker
  • season: Season Number
  • episode: Episode Number
  • scene: Scene Number
  • utterance: Utterance Number

About the Topic Modelling project

I am a devoted fan of the 'Friends' sitcom, having rewatched the series numerous times. Among the characters, Chandler Bing stands out as my favorite male character. His witty humor never fails to bring a smile to my face. The recent loss of the actor, Matthew Perry, who portrayed him, deeply saddened fans around the worlds. In tribute to him and the entire cast of the series, I undertook a small project using this captivating dataset.

In this project, I'll visualize text data using WordCloud, employ the LDA model for topic modeling, and compute coherence scores to assess the model's quality and find the optimal number of topics. I also create an interactive visualization with pyLDAvis, saving it as an HTML link for exploration.

About

In this project, I'll visualize text data using WordCloud, employ the LDA model for topic modeling, and compute coherence scores to assess the model's quality and find the optimal number of topics. I'll create an interactive visualization with pyLDAvis, saving it as an HTML link for exploration.


Languages

Language:Jupyter Notebook 100.0%