sselph / scraper

A scraper for EmulationStation written in Go using hashing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting .hack//link in all GBA games.

DelScipio opened this issue · comments

Same here, not just for GBA games, but for seemingly random games in most systems.

The scraper has become kind of unusable at that point, you have to check each gamelist after scraping...

I assume it's a problem in the screenscraper source. ".hack//LINK " is probably the first entry in the whole database and it picks that by default.

Same for some N64 games.

same here for a lot of PSX games

commented

Running on latest (as of 2/7/18), I've got like a couple hundred NES and SNES games that all get marked as the PSP game ".hack//LINK" as well.

Like, not only is it matching the wrong system, but the file name isn't even close to that name (for example, "Lethal Weapon.zip" gets scraped like this).

It would be nice if the scraper could at least verify the data it's putting in is for the correct console. :)

And if the scraper can't find a match, to just leave that ROM alone, and don't scrape for it?

I'm running version 4.3 of Retropie, with the the scraper built from sources last night (2/7/18) -- with the following options:
Thumbnails Only: Disabled
Arcade Source: ArcadeItalia
Console Source: ScreenScraper
ROM Names: theGamesDB
Gamelist: Overwrite
Use rom folder: Enabled
Download Vidoes: Enabled
Download Marquees: Disabled
Max: 400x400

commented

Looking at the XML that gets generated -- I wonder if this may be related to the "Use rom folder" option being enabled, check this out:

<game id="65505" source="screenscraper.fr">
    <path>./10 Yard Fight.zip</path>
    <name>.hack-Link</name>
    <desc>The first game in the .hack series for PSP (and the planned final game for the franchise), .hack//LINK logs player into a new version of its virtual landscape called The World R:X (the &amp;quot;R&amp;quot; stands for &amp;quot;Revision&amp;quot;)...[/truncated]</desc>
    <image>./images/10 Yard Fight-image.jpg</image>
    <rating>0.85</rating>
    <releasedate>20100304T000000</releasedate>
    <developer>Bandai Namco</developer>
    <publisher>CyberConnect2</publisher>
    <genre>Role playing games</genre>
</game>

Notice that the name starts with "./" (because, it's in the same folder as gamelist.xml) -- I wonder if it's passing that into the search engine by accident? -- I'm going to try with that option disabled when I get home.

EDIT: Of course that didn't work. sigh -- It would have been too easy. >.<

For my part I wonder if it's not rather a bug in the screenscraper API.

I would file a bug there, but I'd rather be sure it's their API and not this scraper only.

Can someone try to scrap the same failing files with Universal XML Scraper and/or Skrape ? (I'm on Mac so I can't use them).

commented

I'm going to try a different source next. -- If that doesn't work, I'll give one of those tools a try.

commented

Changing sources worked, but no video feeds, and it leaves a lot to be desired. -- I'll try one of those tools today.

So I guess it's partially a problem with the ScreenScraper source? -- Not sure on who's end though.
However, the tool itself should at least be validating that the game it's scraping is for the correct system... (Ever try to plug a PSP game into an NES? Doesn't work that well in practice.)

Thanks! I'll file a bug at Screenscraper.

commented

Looks like UXS works and doesn't have this issue (even when using Screenscraper) -- I think it's a problem with this scraper when using that source.

Hi, I can't test right now, but if someone can test UXS in "filename" search (not "CRC+Filename") Maybe the API can return something wrong with a bad filename... (so it must do the same on SSelph's scraper and UXS)

But I think nothing change on the API from a "long" time ;) (the only change can be on the New API V2, and not sure, but I think nothing change on it from several month too ;) )

From what I gather on the forum, it looks like the V2 has indeed been released, but it's not very clear.

They're investigating on their side too: https://screenscraper.fr/forumsujet.php?frub=12&fsuj=550

Hi SSelph ;) The V2 isn't "officialy" released ;) but some already use it as it already pretty stable and won't move so much ;) (lately a light correction on the Json Version but no movement on it since a while ;) )

I had try some test on the API (V2 and V1) with the game "Lethal Weapon.zip" and all is ok on the API return...

I also check the game ".hack//link" on PSP... It seam this game haven't SHA1 referenced on ScreenScraper... Maybe when a SHA1 isn't found it return this game (so No SHA1 found taking the first game with No SHA1) ?

(And Yes @cosmo0 ;) its me ;) )

I also check the game ".hack//link" on PSP... It seam this game haven't SHA1 referenced on ScreenScraper... Maybe when a SHA1 isn't found it return this game (so No SHA1 found taking the first game with No SHA1) ?

It's possible but it's far from the only game without SHA1 hash, so that would be surprising.

commented

It's possible but it's far from the only game without SHA1 hash, so that would be surprising.

.hack//link is however probably the first (or last, depending on how the sorting works) game without an SHA1 hash in alphabetical order.

Sorry for taking so long to take a look. I added an update to double check the results from ScreenScraper to make sure they are sane. In testing I also noticed that an empty file will result in a match from SS so I'm filtering that client side so that we don't match it.

Thanks! It seems to have fixed the issue, I ran a scrap again, and I have not gotten any ".hack//Link" entry.

This is fixed in 1.4.6 for the Name and description. .. but it still downloads a generic game image that has box art for "hack//link" as the gameart. Seems like someone's practical joke but it actually uses this generic image for all games that don't have entries if you are using the "-add_not_found" switch.. so that's still not desirable. Best if it could just use a "No image available" generic image or something for not founds.