SteamGame Recommendation

SteamGame Recommendation is to recommend playable able based on your information.

Business Objective

In the case of movies and music, there're representative recommendation systems :

Movies for Netflix
Musics for Spotify

However, there is no representative recommendation system for games!

Therefore, we are going to make a system to recommend games.

Dataset Description

This dataset is combination of 'Steam Video Games', and 'Steam Store Games (Clean dataset)'.

Dataset 'Steam Store Games (Clean dataset)'

dataset Link
This dataset combined data of 27,000 games scraped from Steam and SteamSpy APIs.
Copyright Rule : CC BY 4.0
It has 18 attributes.

Attribute	Type	Explanation	Example
appid	Nomial	Unique identifier for each title	10, 20, 30, ...
name	Nomial	Title of app (game)	Left 4 Dead, Dota 2, ...
release_date	Nomial	Release date in format YYYY-MM-DD	2008-11-17, 2009-11-19, ...
english	Categorical	Language support: 1 if is in English	0, 1
developer	Categorical	Name (or names) of developer(s). Semicolon delimited if multiple	Valve, Mark Healey, ...
publisher	Categorical	Name (or names) of publisher(s). Semicolon delimited if multiple	Valve, Mark Healey, ...
platforms	Categorical	Semicolon delimited list of supported platforms. At most includes: windows;mac;linux	windows, windows;mac;linux, ...
required_age	Categorical	Minimum required age according to PEGI UK standards. Many with 0 are unrated or unsupplied.	0, 16, 18, ...
categories	Nomial	Semicolon delimited list of game categories, e.g. single-player;multi-player	Single-player;Multi-player, ...
genres	Nomial	Semicolon delimited list of game genres, e.g. action;adventure	RPG, Strategy, Action;RPG, ...
steamspy_tags	Categorical	Semicolon delimited list of top steamspy game tags, similar to genres but community voted, e.g. action;adventure	Action;FPS;Multiplayer, ...
achievements	Discrete	Number of in-games achievements, if any	0, 147, 54, ...
positive_ratings	Discrete	Number of positive ratings, from SteamSpy	124534, 3318, ...
negative_ratings	Discrete	Number of negative ratings, from SteamSpy	3339, 633, ...
average_playtime	Discrete	Average user playtime, from SteamSpy	17612, 277, 187, ...
median_playtime	Discrete	Median user playtime, from SteamSpy	317, 62, 34, ...
owners	Categorical	Estimated number of owners. Contains lower and upper bound (like 20000-50000). May wish to take mid-point or lower bound. Included both to give options.	5000000-10000000, ...
price	Continuous	Current full price of title in GBP, (pounds sterling)	7.19, 3.99, 5.79, ...

Dataset 'Steam Video Games'

dataset Link
This dataset is for recommend video games from 200k steam user interactions.
Copyright Rule : DbCL v1.0
It has 4 attributes.

Attribute	Type	Explanation	Example
user-id	Nomial	User ID	151603712, 187131847, ...
game-title	Nomial	Name of the steam game	Dota 2, FINAL FANTASY XIII, ...
behavior-name	Categorical	behavior name	purchase, play
value	Continuous	Hours if behavior is play, 1.0 if behavior is purchase	1.0, 9.8, 9.7, ...

Data Exploration

Steam Store Games (Clean dataset)

informations of positive_ratings and negative_ratings

Positive Rating Table

Positive Rating Ratio

Distributions of columns

Dataset 'Steam Video Games'

informations of value

Top 10 Users of value (Play-time) + What They Played

Architecture

If the group is ....

Large : Use Collaborative Filtering
Small : Use Cotent-Based Filtering
- For avoiding long-tail problem.

Clustering

To solve the long tail problem, We divided and filtered the columns.

This is the result of the clustering.

Collaborative Filtering

There is no rating column in our data.

So, we calculated the user's rating by comparing the user's play hour with average played hour.

We calculated user's rating.

We made CF_recommend_Game function.
It gets the user id input and calculates the estimated score for each game name using svd.

This is the result of the CF function.

Content-Based Filtering

df = pd.read_csv('steam.csv')
df['rating'] = ((df['positive_ratings'] 
               - df['negative_ratings'])
               /(2 * (df['positive_ratings'] 
                    + df['negative_ratings'])) + 0.5) * 10.0

To sort the recommendation result, We compute rating.

Column categories has lots of data.
Because They takes lots of time to handle, we use only firse 5 data.

This is the result of the Content-Based filtering.

twoone17 / SteamGame-Recommendation

SteamGame Recommendation

Business Objective

Dataset Description

Data Exploration

Architecture

About

Languages