juandes / infinity-war-spacy

Code that supplements my article: Reliving Avengers: Infinity War with spaCy and Natural Language Processing

Home Page:https://towardsdatascience.com/reliving-avengers-infinity-war-with-spacy-and-natural-language-processing-2abcb48e4ba1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reliving Avengers: Infinity War with spaCy and Natural Language Processing

Overview

This repo contains the scripts used in my latest experiment titled Reliving Avengers: Infinity War with spaCy and Natural Language Processing, available at this link Reliving Avengers: Infinity War with spaCy and Natural Language Processing.

Using spaCy, an NLP Python open source library designed to help us process and understand volumes of text, I analyzed the script of the movie to investigate the following concepts:

  • Overall top 10 verbs, nouns, adverbs and adjectives from the film.
  • Top verbs and nouns spoke by a particular character
  • Top 30 named entities from the film
  • The similarity between the lines spoken by each character pair, e.g., the similarity between Thor's and Thanos' lines.

Tools used

  • Python
  • spaCy

Repo content

Besides the scripts, the repo contains the full movie script (raw_script.txt), the script without comments, scenes descriptions, and the subjects (cleaned-script.txt), and the cleaned script but with the subjects (cleaned-script-subject.txt). Moreover, the plots directory contains all the plots that show the top nouns, adverbs, adjetives, verbs and entities per character.

Thanks to Manuel Romero (https://github.com/mrm8488) for writing the Jupyter notebook.

About

Code that supplements my article: Reliving Avengers: Infinity War with spaCy and Natural Language Processing

https://towardsdatascience.com/reliving-avengers-infinity-war-with-spacy-and-natural-language-processing-2abcb48e4ba1


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.5%