statsbomb / statsbombpy

Easily stream StatsBomb data into Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing information in entities, inconsistent querying, and too many warnings

TomDecroos opened this issue · comments

Great to see a single API for both the free and the premium statsbomb data! I think this has a lot of potential to become a widely used library. I have three suggestions below.

I'd love to use statsbombpy as the default way to download statsbomb data in my notebooks at https://github.com/ML-KULeuven/socceraction The way I do it now is to just download the entire statsbomb open data git repository in a massive zip. This will obviously have some scalability issues in the future as you guys keep on adding more data.

Updating my notebooks to work with statsbombpy would allow me to (1) fix the scalability issue and (2) make all code and models in my socceraction repo easily applicable to both the free and premium statsbomb data with zero configuration required.

However, while attempting to update my notebooks to use statsbombpy, I encountered some issues.

Missing information (mostly identifiers)

Certain crucial properties are missing from the entities, for example:

  • home_team_id and away_team_id in matches
  • team_id in lineups

Inconsistent parameters when querying

To query matches from a season, you use the following.
sb.matches(competition_id=9, season_id=42)

However, to query the events of that season, you use a different (and in my opinion less practical) set of parameters:

bundesliga = {
    "country": "Germany",
    "division": "1. Bundesliga",
    "season": "2019/2020",
    "gender": "male"
}
events = sb.competition_events(competition=bundesliga)

Too many warnings

When given a dataframe with 100 matches and querying the lineups of those matches, the following code will print the message "credentials were not supplied. open data access only" 100 times in the console, which can be fairly annoying.

lineups = []
for match_id in matches.match_id:
    lineup = sb.lineups(match_id)
    lineups.append(lineup)

Please enable some way to turn off this message, or convert it to a real python warning such that it only gets printed once. :-)

In summary, please make all information that is present in the raw JSON files is also available in the entities (or give the option to just download the raw jsons without first converting them to the entities in entities.py), make all querying id-based as it is much cleaner, and don't print too much to the console.

I hope you implement (some of) these suggestions!​

Hi Tom. Thanks a lot for your suggestions. I agree with some of them and will address them when I work with the package again.