Movie Classifier
This is a simple implementation of a multi-class multi-label movie classifier based on a title and a description. We used the Keras framework in Python for buildding and training the model. The model we used consists of two BiLSTMs that learn representations for the titles and the descriptions, concatenates them and gives probabilities for the different genres using a dense layer with sigmoid activation. We initialized a shared embedding layer with pre-trained word embeddings from the GloVe model to boost the metrics of the model. We used binary cross entropy loss, since we have multi-label classification and we want to optimize the outputs independently.
Installation
Install the dependencies using pip:
pip install . --upgrade
Training
-
Download the dataset from here and put the csv file in the dataset folder.
-
Download the pre-trained word embeddings from here and put the txt file in the dataset folder.
-
Run:
python model.py --mode train --model_path model --embeddings_size 100 --max_length 128
-
You can also run tensorboard to view the graphs of the metrics:
tensorboard --logdir logs --port 8000
Inference
python model.py --mode classify --model_path model --title 'Inception' --description "Dom Cobb is a thief with the rare ability to enter people's dreams and steal their secrets from their subconscious. His skill has made him a hot commodity in the world of corporate espionage but has also cost him everything he loves. Cobb gets a chance at redemption when he is offered a seemingly impossible task: Plant an idea in someone's mind. If he succeeds, it will be the perfect crime, but a dangerous enemy anticipates Cobb's every move."
Output:
[('Action', 0.7449887), ('Thriller', 0.6140004), ('Crime', 0.4795791), ('Comedy', 0.4772842)]
Serving
If you want to run a Flask web API for the model:
python model.py --mode serve --port 9000
and then to get results: http://127.0.0.1:9000/classify?title=Inception&description=Dom%20Cobb%20is%20a%20thief...
Output:
{"genres": [["Action", 0.7449886798858643], ["Thriller", 0.6140003800392151], ["Crime", 0.4795790910720825], ["Comedy", 0.47728419303894043]]}
Evaluation
label | precision | recall | F1 | support |
---|---|---|---|---|
Action | 0.577 | 0.420 | 0.486 | 1049 |
Adventure | 0.440 | 0.213 | 0.287 | 522 |
Animation | 0.652 | 0.274 | 0.386 | 548 |
Comedy | 0.595 | 0.633 | 0.613 | 2552 |
Crime | 0.424 | 0.253 | 0.317 | 596 |
Documentary | 0.879 | 0.534 | 0.664 | 1101 |
Drama | 0.627 | 0.712 | 0.667 | 3482 |
Family | 0.522 | 0.287 | 0.370 | 541 |
Fantasy | 0.496 | 0.128 | 0.203 | 493 |
Foreign | 0.091 | 0.003 | 0.006 | 333 |
History | 0.370 | 0.135 | 0.197 | 223 |
Horror | 0.727 | 0.532 | 0.614 | 882 |
Music | 0.606 | 0.355 | 0.448 | 290 |
Mystery | 0.396 | 0.144 | 0.211 | 382 |
Romance | 0.514 | 0.419 | 0.462 | 1037 |
Science Fiction | 0.680 | 0.475 | 0.560 | 612 |
TV Movie | 0.500 | 0.004 | 0.008 | 249 |
Thriller | 0.509 | 0.312 | 0.387 | 1179 |
War | 0.560 | 0.365 | 0.442 | 178 |
Western | 0.692 | 0.512 | 0.589 | 123 |
metric | value |
---|---|
Micro Average Precision | 0.605 |
Micro Average Recall | 0.466 |
Micro Average F1 | 0.526 |
Accuracy | 0.226 |
Macro Average Precision | 0.585 |
Macro Average Precision (not weighted) | 0.542 |
Macro Average Recall | 0.466 |
Macro Average Recall (not weighted) | 0.335 |
Macro Average F1 | 0.501 |
Macro Average F1 (not weighted) | 0.395 |