sselph / scraper

A scraper for EmulationStation written in Go using hashing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

thegamedb API has changed?

pgiblock opened this issue · comments

Not sure, but I am a first time user of this scraper. I ran into the issue "It appears that thegamesdb.net isn't up". Looking at the code, it appears that the scraper attempts to GET http://thegamesdb.net/api/GetGame.php?id=1. After following the 302, a 404 is returned. From https://api.gamesdb.net , it appears the API has changed? Looks like one now needs to hit https://api.thegamesdb.net/Games/ByGameID?id=1&apikey=<API_KEY>.

Is this a recent change on gdb's side? Are there any plans to support the new API? I'm going to modify the code locally and hardcode an API Key temporarily and report back. Hopefully the endpoint paths (and addition of an API Key) is all that changed, and the scraper's parser can remain as-is.

Actually I may have to remove support for this service. The apiKey they mention is for the dev and is limited to something like 1000 queries per month. I think they designed the API quota for people running a web server or something that mirrors the data and not a scraper like mine.

Ideally they would reconsider and allow users to generate an API key and they'd use their own individual quota. The shared quota for an app like mine makes no sense.

I'm working on it now... If it is minor, then expect a pull request later today.

Edit: Blarg... just read your recent comment. This stinks as I feel their metadata is superior. Guess I'll try the 'ss' source and see if that gives me the data I want. Either that, or leverage one of the mirrors they are trying to protect against ;-)

Ah nm looks like I misread. The new documentation is not very good. So the limit seems like it might be per IP so that would be roughly a single user. I would just need to batch the API calls some.

At the moment there is supposedly a legacy subdomain you can add to the url to get it working again until the code has been migrated.

Yeah. Batching sounds ideal to get the query count down. I haven't dug into the guts of the scraper enough to know how painful of a refactor that would be.

Good news: It seems that simply replacing 'thegamedb.net' with 'legacy.thegamedb.net' is a usable stop-gap solution.

Nice.

Yeah the code today is my first Go code so not great to start and over the years has grown to become even less elegant. It does something roughly like the following so not laid out for batch processing in a single database.

for each rom found
  for each DB:
     if result:
       break
     else:
       continue

It was more designed to try multiple databases to fill in gaps from that were missing. A refactor would probably need to do something like

for each DB:
  for each batch of unscraped roms:
    get results(batch)

Yeah, that makes sense, where unscraped roms is initially the full set. Then for each iteration of DB, it is only the set of unresolved roms from the previous iteration. gdb might have some limit on the number of ids allowed in a single query, so some chunking might be in order as well.

Hi there,

I'm currently maintaining TheGamesDB new site and API and would like to give you a quick update in that regard, the new API (and site) is a complete overhaul with nothing but the database from the old site, as such it won't be a simple url change, the new api return is now json with changed field names and data layout.
if you've any questions feel free to tag me here or on the forum.

Regards
Zer0xFF

Thanks. Once I get an API key, I'll start working on it more seriously but if you have documentation of the response formats I can go ahead and have most of it ready. I'll start looking at refactoring the code to make batching a little easier since the new API seems to encourage that.

im afraid thats not available yet, as there are still few more things to implement, and they take priority over documentation.

and we hope that keys will be reissued by next weekend.

Hi, upon the change of the api it finds very little game images per system eg for nes in 400 roms it finds 200 for gameboy in 250 roms it finds 100 is this going to be fixed?

after updating the scraper the xml files has the same address : thegamesdb.net instead of the legacy.thegamesdb.net is this normal?

@symbios24 the legacy subdomain is the old site with only the domain change, so results returned shouldn't be any difference.

I changed any references I was able to find, but there is the possibility I missed some, which endpoint is still returning thegamesdb.net?

so far i tried the gameboy/nes/atari 2600 games and they have the thegamesdb.net to the xml

also atari 5200 is still returning thegamesdb.net i assume all the atari systems do the same

if you can change the scraper for pbp - psx files to download images/pictures based on the name of the game and not on the extension of the filename will be great.

It will require that this project (scraper) will request a API key, see this post.

So you can use the new API, eg: https://api.thegamesdb.net/#/Games/GamesByGameName