twoone17 / SteamGame-Recommendation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SteamGame Recommendation

SteamGame Recommendation is to recommend playable able based on your information.



Business Objective


In the case of movies and music, there're representative recommendation systems :

  • Movies for Netflix
  • Musics for Spotify

However, there is no representative recommendation system for games!

Therefore, we are going to make a system to recommend games.



Dataset Description

This dataset is combination of 'Steam Video Games', and 'Steam Store Games (Clean dataset)'.


Dataset 'Steam Store Games (Clean dataset)'

  • dataset Link
  • This dataset combined data of 27,000 games scraped from Steam and SteamSpy APIs.
  • Copyright Rule : CC BY 4.0
  • It has 18 attributes.
Attribute Type Explanation Example
appid Nomial Unique identifier for each title 10, 20, 30, ...
name Nomial Title of app (game) Left 4 Dead, Dota 2, ...
release_date Nomial Release date in format YYYY-MM-DD 2008-11-17, 2009-11-19, ...
english Categorical Language support: 1 if is in English 0, 1
developer Categorical Name (or names) of developer(s). Semicolon delimited if multiple Valve, Mark Healey, ...
publisher Categorical Name (or names) of publisher(s). Semicolon delimited if multiple Valve, Mark Healey, ...
platforms Categorical Semicolon delimited list of supported platforms. At most includes: windows;mac;linux windows, windows;mac;linux, ...
required_age Categorical Minimum required age according to PEGI UK standards. Many with 0 are unrated or unsupplied. 0, 16, 18, ...
categories Nomial Semicolon delimited list of game categories, e.g. single-player;multi-player Single-player;Multi-player, ...
genres Nomial Semicolon delimited list of game genres, e.g. action;adventure RPG, Strategy, Action;RPG, ...
steamspy_tags Categorical Semicolon delimited list of top steamspy game tags, similar to genres but community voted, e.g. action;adventure Action;FPS;Multiplayer, ...
achievements Discrete Number of in-games achievements, if any 0, 147, 54, ...
positive_ratings Discrete Number of positive ratings, from SteamSpy 124534, 3318, ...
negative_ratings Discrete Number of negative ratings, from SteamSpy 3339, 633, ...
average_playtime Discrete Average user playtime, from SteamSpy 17612, 277, 187, ...
median_playtime Discrete Median user playtime, from SteamSpy 317, 62, 34, ...
owners Categorical Estimated number of owners. Contains lower and upper bound (like 20000-50000). May wish to take mid-point or lower bound. Included both to give options. 5000000-10000000, ...
price Continuous Current full price of title in GBP, (pounds sterling) 7.19, 3.99, 5.79, ...



Dataset 'Steam Video Games'

  • dataset Link
  • This dataset is for recommend video games from 200k steam user interactions.
  • Copyright Rule : DbCL v1.0
  • It has 4 attributes.
Attribute Type Explanation Example
user-id Nomial User ID 151603712, 187131847, ...
game-title Nomial Name of the steam game Dota 2, FINAL FANTASY XIII, ...
behavior-name Categorical behavior name purchase, play
value Continuous Hours if behavior is play, 1.0 if behavior is purchase 1.0, 9.8, 9.7, ...



Data Exploration


Steam Store Games (Clean dataset)
  • informations of positive_ratings and negative_ratings


  • Positive Rating Table


  • Positive Rating Ratio


  • Distributions of columns




Dataset 'Steam Video Games'

  • informations of value


  • Top 10 Users of value (Play-time) + What They Played



Architecture



If the group is ....

  • Large : Use Collaborative Filtering
  • Small : Use Cotent-Based Filtering

Clustering


To solve the long tail problem, We divided and filtered the columns.



This is the result of the clustering.




Collaborative Filtering


There is no rating column in our data.

So, we calculated the user's rating by comparing the user's play hour with average played hour.



We calculated user's rating.



We made CF_recommend_Game function.
It gets the user id input and calculates the estimated score for each game name using svd.



This is the result of the CF function.




Content-Based Filtering


df = pd.read_csv('steam.csv')
df['rating'] = ((df['positive_ratings'] 
               - df['negative_ratings'])
               /(2 * (df['positive_ratings'] 
                    + df['negative_ratings'])) + 0.5) * 10.0

To sort the recommendation result, We compute rating.



Column categories has lots of data.
Because They takes lots of time to handle, we use only firse 5 data.



This is the result of the Content-Based filtering.


About


Languages

Language:Jupyter Notebook 98.7%Language:Python 1.3%