ryanmcdermott / trump-speeches

:page_facing_up: 1mb Archive of Donald Trump Speeches

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Split by Speech

thedansimonson opened this issue · comments

Thanks for making this available!

While the text itself is nice to have, some more interesting tasks can be done if the data is split into separate speeches, in some form: e.g. looking at how his rhetoric evolved over time, generating narrative schemas from his text, etc.

It's hard to run NLP on it without adding artificial document splits, and a lot more can be done with the text with a few parses slapped on top.

Addressing #2 here.

Thank you for pointing that out, I overlooked this foolishly when scraping. It won't be too hard to reassemble where they came from, and you're right, the more interesting things are when you have more metadata and documents. I was training an RNN on the data so I needed all in one place and didn't think to keep it separate.

Thanks, looking forward to an update!

Just want to say that I also am very interested in where the speech data is coming from and would be happy to help expand this.

addressed in #4