Whoscored: downloading games problem
Gibranium opened this issue · comments
I've tried with a couple of custom leagues, more specifically BEL-Jupiler Pro League and USA-Major League Soccer.
While the Jupiler works fine, the Major League doesn't run properly:
ws = sd.WhoScored("USA-Major League Soccer", "2021", headless=False, no_cache = False)
leagues = ws.read_leagues()
df = ws.read_events()
[07/06/24 12:39:29] INFO Retrieving calendar for USA-Major League Soccer 2020 (Major League [whoscored.py](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py):[363](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py#363)
Soccer Playoff)
INFO [1/2] Retrieving fixtures for USA-Major League Soccer 2020 (Major [whoscored.py](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py):[391](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py#391)
League Soccer Playoff)
INFO [2/2] Retrieving fixtures for USA-Major League Soccer 2020 (Major [whoscored.py](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py):[391](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py#391)
League Soccer Playoff)
INFO Retrieving calendar for USA-Major League Soccer 2020 (Major League [whoscored.py](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py):[363](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py#363)
Soccer)
INFO [1/2] Retrieving fixtures for USA-Major League Soccer 2020 (Major [whoscored.py](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py):[391](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py#391)
League Soccer)
INFO [2/2] Retrieving fixtures for USA-Major League Soccer 2020 (Major [whoscored.py](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py):[391](file:///Users/davidegualano/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py#391)
League Soccer)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/pandas/core/indexes/base.py:3802, in Index.get_loc(self, key)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
File index.pyx:153, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:182, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'game_id'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[6], line 1
----> 1 df = ws.read_events()
File ~/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/soccerdata/whoscored.py:690, in WhoScored.read_events(self, match_id, force_cache, live, output_fmt, retry_missing, on_error)
688 team_names = {}
689 for i, (_, game) in enumerate(iterator.iterrows()):
--> 690 url = urlmask.format(game["game_id"])
691 # get league and season
692 logger.info(
693 "[%s/%s] Retrieving game with id=%s",
694 i + 1,
695 len(iterator),
696 game["game_id"],
697 )
File ~/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/pandas/core/series.py:1111, in Series.__getitem__(self, key)
1108 return self._values[key]
1110 elif key_is_scalar:
-> 1111 return self._get_value(key)
1113 # Convert generator to list before going through hashable part
1114 # (We will iterate through the generator there to check for slices)
1115 if is_iterator(key):
File ~/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/pandas/core/series.py:1227, in Series._get_value(self, label, takeable)
1224 return self._values[label]
1226 # Similar to Index.get_value, but we do not fall back to positional
-> 1227 loc = self.index.get_loc(label)
1229 if is_integer(loc):
1230 return self._values[loc]
File ~/anaconda3/envs/Soccerdata/lib/python3.11/site-packages/pandas/core/indexes/base.py:3809, in Index.get_loc(self, key)
3804 if isinstance(casted_key, slice) or (
3805 isinstance(casted_key, abc.Iterable)
3806 and any(isinstance(x, slice) for x in casted_key)
3807 ):
3808 raise InvalidIndexError(key)
-> 3809 raise KeyError(key) from err
3810 except TypeError:
3811 # If we have a listlike key, _check_indexing_error will raise
3812 # InvalidIndexError. Otherwise we fall through and re-raise
3813 # the TypeError.
3814 self._check_indexing_error(key)
KeyError: 'game_id'
I've not actually tried all the leagues, so I cannot list which one have this problem.
Can you first try if it works with
ws = sd.WhoScored("USA-Major League Soccer", "2021", no_cache = True)
df = ws.read_events()
Well, It worked, but it's downloading both the 2020 and 2021 Major League seasons under the 2020 year class.
Can I also ask why removing headless and cache made it run?
Setting no_cache = True
works because you probably had some old files with a different structure in your cache. Removing headless didn't do anything: headless = False
is the default.
I don't know why it downloads both seasons, but the MLS isn't supported anyway.