sneakyPad / movielens-metadata-fetcher

Enhances movies from the movielens dataset with IMDb metadata

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enhancing movielens data with IMDb metadata

This repository uses the imdbpy.github.io to fetch the metadata for movielens movies. The movielens dataset contains a csv file that has the mapping of movielens id to IMDb id. This id is used to fetch the main attributes with IMDbPY. Since IMDbPY does not fetch all attributes I employ Beautiful Soup to fetch additional metadata, such as:

  • Stars (main actors of the movie)
  • Demographic data (age & gender)
  • Distribution of ratings
  • US Users and Non-US Users

For the movie avatar this can be found here:

The mapping is available in data/input/movielens/small/links.

Excerpt input:

movieId imdbId tmdbId
1 0114709 862
2 0113497 8844
3 0113228 15602

Fetched attributes:

original_title, cast, genres, runtimes, countries, country_codes, language_codes, color_info, aspect_ratio, sound_mix, original_air_date, rating, votes, imdbid, plot_outline, languages, title, year, kind, directors, writers, producers, composers, editors, animation_department, casting_department, music_department, writer, director, top_250_rank, plot, set_decorators, script_department, assistant_directors, costume_designers, budget, cumulative_worldwide_gross, stars, cast_id, stars_id

Keywords

  • 🎥 MovieLens
  • 👥 IMDb Metadata

Architecture

  • Python 3.7
  • IMDbPY
  • Beautifoul Soup 4 for metadata that is not available by using pyIMDb

ToDo

  • Create bin size for continous features

About

Enhances movies from the movielens dataset with IMDb metadata

License:MIT License


Languages

Language:Python 99.8%Language:Makefile 0.2%