manami-project / anime-offline-database

Updated every week: A JSON based anime dataset containing the most important meta data as well as cross references to various anime sites such as MAL, ANIDB, ANILIST, KITSU and more...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to report duplicates in Manami entries?

jilljenn opened this issue · comments

Hi,

As the "merge locks" are not defined anymore and as Manami IDs change over time, what is the best way to report to you duplicates? What do you internally use as unique identifier, e.g. "Berserk (2016)"?

Suggestions:

  • Sending you sets of titles that actually belong to the same work
  • Sending you sets of sources that point to the same work

I have a dozen of them.

Hello @jilljenn,

I'm not sure what you mean with "Manami IDs". The entries don't have their own IDs. Unique identifiers are always each (!) URL from the sources property. So you can basically retrieve an entry by using the ID of a meta data provider. Example: If you seek a specific anime and you know its ID on MAL then you can check for the entry having the MAL URL with the respective ID in the sources array. All other entries in sources are then cross references. This is the intended usage.

About duplicates. It depends on how you define a duplicate. If you mean an entry having two sources of the same meta data provider, you should report the duplicate to the respective meta data provider.
In case you mean a number of entries which should be merged together, because they describe the same anime, then from my perspective this is not a duplicate. Why is that? From my perspective they are simply not merged together. The only downside then is that cross referencing is not working. However since the intended usage, which I described above, always has its origin on a lookup using an ID of a meta data provider, there is not really an issue of having a duplicate.

If you have a totally different use-case and you use the entries as-is working with titles or the animeSeason without filtering for a specific meta data provider, then I can understand that you probably run into a problem with duplicates.

First please check your entries with the past release notes. I posted sources there which I manually split and why I split them.

If you post them here I would prefer them as a set of sources. However I would want to check them before simply applying them.
Let me think about this.

Thanks, I understand. By "Manami ID" indeed, I was referring to the position in the list.

Just to give you an example:

  • 23701th is Yoru wa Mijikashi Aruke yo Otome https://anidb.net/anime/12647
  • 23702th is Yoru wa Mijikashi Arukeyo Otome ['https://anilist.co/anime/97917',
    'https://kitsu.io/anime/13138',
    'https://myanimelist.net/anime/34537',
    'https://notify.moe/anime/-ttMhKmiR']

They describe the same anime.

In my case, I am using Manami to prevent duplicates in my database. But sometimes I have to search by title, and I can find several entries in Manami for the same title.

Okay I see. You shouldn't determine anything based on the position in the file. All entries are simply sorted by title, type and then episodes.

Your example is the second case I described in my first comment. There are probably thousands of entries like that. I wrote an analyzer tool with which I check those and can merge them manually.
Currently I haven't planned any workflow to work with suggestions for merges.
So I need to think about that for a little bit.

What about animes with multiple seasons that are separate on some source and merged on another ?
Note, this can also applies to specials, ova, ona, or movies, which can be merged with the anime (or anime season) they're from.
Note2, some double-episodes could be counted as separate on one source and a single one on another.

Hi @sebbu2,

What about animes with multiple seasons that are separate on some source and merged on another ?

Regarding the splits I can say that it varies for everything merged automatically. For the entries I checked manually, I used to make separate entries in case of a season split. (see previous release notes).

Note, this can also applies to specials, ova, ona, or movies, which can be merged with the anime (or anime season) they're from.

Possible, yes. But it's more unlikely than one part of a split season being merged to the entries which don't split the season.

Just sticking my nose into this..
I for one would like merging of entries where the only effective difference is the language (english vs English Romanisation) or flavor of Romanisation (AniDB Romanisation : https://wiki.anidb.net/AniDB_Definition:Romanisation).

As for merging entries into other entries based on combining ova, specials, movies, seasons etc... That would make this project useless to me, so if that were to be done I would request doing it where an additional file is added that has that stuff in it.

I'd also suggest looking into using TheTVDB (which does this kind of thing already) as well as ScudLee's AnimeList project which maps anidb entries to TheTVDB entries. https://github.com/ScudLee/anime-lists

BTW, if you have something that does the merging already (based on language/Romanisation , but is manual ... is that something you can share?

Ah, Shame on me.
It looks like there are already entries which are merged, where it looks like the difference may just be Romanisation, tho there are also some entries that could be merged .. Is the merging something done manually right now or automatically when it's some of the simple Romanisation or language differences and the rest of the data matches? (aka type, season, episode # etc)

@jilljenn Currently I can't think of a way that wouldn't result in more work on my end.
Maybe I figure something out later. For now I'm closing this issue.