WilliamOnVoyage / World-of-Warships-Stats-Analysis

Data acquisition, modeling and analysis for the game World of Warships players' performance

Home Page:https://williamonvoyage.github.io/World-of-Warships-Stats-Analysis/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

World of Warships Stats Analysis and Web Application

Build Status AWS CodeBuild Test Coverage Maintainability CodeFactor Pythonversion

System Design

Architecture diagram

Created using Gliffy

Major classes

Class Description functions attributes
wows_api ... ... ...
abstract_db ... ... ...
prediction_model ... ... ...
web_connector ... ... ...

API

This python based script handles World of Warships API request for statistical data and store them in local MySQL database. The World of Warships API needs an application_id for credential connection with the API server, the application_id should be registered on Wargaming.net and stored in a local configuration file named as "config.json". Also the ip address of the terminal running this script (provided by package ipgetter) should be added in your application launched on developer room of Wargaming.net.

There are several limitations, as well as specific JSON format regarding different types of the API request (refer to Wargaming.net API reference), please check based on your need.

Database

MongoDB

Since the API request returns JSON format data, it is natural to use MongoDB (BSON) for data storing. The newest and historical stats of a player differ a little. To be consistent with the data, we store the newest stats and historical stats differently.

MySQL [Deprecated]

The script connects relational database (MySQL, AWS RDS, etc.) for storing extracted data. The players' id list is stored in an individual table wows_idlist, which is essential for efficient API request since the complete id list is not officially provided, and the account number is sparsely distributed in a large range (WOWS account number range). Some statistics like the number of battles are stored in wows_stats, and you can customize your own database as well. The players' statistical data can then be retrieved through SQL and analyzed for your own purpose.

We replaced the MySQL with MongoDB due to the performance limitation.

Newest stats:

{
      "_id":1008331251,
      "daily_stats":{
            ObjectId('000000201701011008331251'),
            ...
      },
      "account_id": 1008331251,
      "nickname": "zmlzeze",
      "last_battle_time": 1500140223,
      "leveling_tier": 15,
      "created_at": 1435322987,
      "leveling_points": 8612323,
      "updated_at": 1500053592,
      "private": null,
      "hidden_profile": false,
      "logout_at": 1500053581,
      "karma": null,
      "statistics": {
        "distance": 117155,
        "battles": 3143,
        "pvp": {
        ...
        }
      },
      "stats_updated_at": 1500140964
    }

Historical stats:

{
      "_id":ObjectId('000000201701011008331251'),
      "capture_points": 399,
      "account_id": 1008331251,
      "max_xp": 4913,
      "wins": 1742,
      "planes_killed": 5550,
      "battles": 2882,
      "damage_dealt": 213130514,
      "battle_type": "pvp",
      "date": "20170101",
      "xp": 3923528,
      "frags": 3612,
      "survived_battles": 1356,
      "dropped_capture_points": 3629
}

The database provides stats for modeling and web application, thus the performance is crucial. For NA server, the player number is about 1.6 million, and about 30% play at least 100 battles (considered as valid players). Since each player has daily update, the total number of historical stats will keep increasing with time. Based on estimation, the newest stats for 1.6 million players take up to 2 GB memory, while the historical stats of valid players over a year take about 50 GB memory on disk.

Analysis

Data Preprocessing

When retrieving players' data from database, we use pandas Panel to construct the 3D DataFrame as:

ID\day 1 2 3 ...
10001 [t,w,l,d] [t,w,l,d] [t,w,l,d] ...
10002 [t,w,l,d] [t,w,l,d] [t,w,l,d] ...
10003 [t,w,l,d] [t,w,l,d] [t,w,l,d] ...
... ... ... ... ...

The [t,w,l,d] is the vector of one day's stats of [battles,wins,losses,draws].

LSTM Model

We use the LSTM without attention model to predict the players' performance based on previous days' stats. The prediction is within certain time window and the objective is to minimize the distance between the ground truth and predicted stats vectors:

Local configuration file format

{
  "wows_api": {
    "application_id": "XXX",
    "player_url": "https://api.worldofwarships.com/wows/account/list/",
    "account_url": "https://api.worldofwarships.com/wows/account/info/",
    "stats_by_date_url": "https://api.worldofwarships.com/wows/account/statsbydate/",
    "DB_TYPE": "mongo",
    "DATE_FORMAT": "%Y-%m-%d",
    "NA_ACCOUNT_LIMIT_LO": 1000000000,
    "NA_ACCOUNT_LIMIT_HI": 2000000000,
    "ID_STEP": 100,
    "SIZE_PER_WRITE": 10000,
    "URL_REQ_DELAY": 0,
    "URL_REQ_TIMEOUT": 45,
    "URL_REQ_TRYNUM": 3
  },
  "mysql": {
    "dbname": "XXX",
    "usr": "XXX",
    "pw": "XXX",
    "hostname": "XX.XX.XX.XX",
    "port": 123
  },
  "mongo": {
    "dbname": "XXX",
    "collection": "XXX",
    "usr": "XXX",
    "pw": "XXX",
    "hostname": "XX.XX.XX.XX",
    "port": 123
  },
  "AWS_RDS": {
    "dbname": "XXX",
    "usr": "XXX",
    "pw": "XXX",
    "hostname": "XX.XX.XX.XX",
    "port": 123
  }
}

Account id range:

  • [0, 500000000) : 'RU';
  • [500000000, 1000000000) : 'EU';
  • [1000000000, 2000000000) : 'NA';
  • [2000000000, 3000000000) : 'ASIA';
  • [3000000000, ) : 'KR';

Web Application

We use the Flask framework to develop the front-end web application with Python back-end.

HTML/JavaScript front-end

Python back-end


More projects on my private repository summary

About

Data acquisition, modeling and analysis for the game World of Warships players' performance

https://williamonvoyage.github.io/World-of-Warships-Stats-Analysis/

License:GNU General Public License v2.0


Languages

Language:HTML 64.1%Language:Python 20.8%Language:JavaScript 14.3%Language:Dockerfile 0.8%