yingyangle / whosaidthat

Who Said That? predict the speaker of a line of dialogue from TV shows

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

whosaidthat

Features

# Feature Description
0 utterance length number of words in the line
1 average word length average length of words in the line
2 word diversity type-token ratio for this line
3 stop words ratio percentage of words in this line that are stop words
4 neologisms ratio percentage of words in this line that are not in our vocabulary
5 number of numbers how many numbers this line contains
6 number of profanity words how many profanity words this line contains
7 subjectivity subjectivity score form textblob
8 polarity polarity score form textblob
9 question count number of sentences in this line that are questions
10 exclamation count number of sentences in this line that end in exclamation marks
11 ellipses count number of ellipses this line contains
12 to 12+N-1 top words number of words in this line that are also in each character's top 20 most frequent words, for the N main characters of the show

Characters

Big Bang Theory (45,825 lines, 7 characters)

Amy (3,473), Bernadette (2,687), Howard (5,858), Leonard (9,765), Penny (7,659), Raj (4,680), Sheldon (11,703),

The Simpsons (67,955 lines, 5 characters)

Bart (13,139), Homer (28,447), Lisa (10,945), Marge (13,367), Ned Flanders (2,057)

Desperate Housewives (18,437 lines, 4 characters)

Bree (4,130), Gabrielle (4,564), Lynette (4,618), Susan (5,125)

Poster

About

Who Said That? predict the speaker of a line of dialogue from TV shows


Languages

Language:Python 76.9%Language:HTML 23.1%