funmaker / Hybooru

Hydrus-based booru-styled imageboard in React

Home Page:https://booru.funmaker.moe/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to initialize DB: duplicate key value violates unique constraint "tag_postids_tagid_idx"

Xunnamius opened this issue · comments

Hi there. First: thank you for your work on this project.

Unfortunately, I've been encountering an error when rebuilding the database this past week. Log output is below. Might be a consequence of excessive tag sibling/parent loops.

I addressed it by adding ON CONFLICT DO NOTHING to applyTagSiblings, which matches all of the other INSERT INTO ... SELECT ... FROM ... statements in tag-related functions in that file.

In a couple days, if this turns out to be a sufficient fix, I can make a PR if you'd like.

Log output
Posts (153553/153553)           [####################] Done in 3.24s
Posts (3299/3299)               [####################] Done in 45ms
Posts (2470/2470)               [####################] Done in 61ms
Posts (628/628)                 [####################] Done in 25ms
Urls (601716/601716)            [####################] Done in 2.02s
Notes                           [####################] Done in 0ms
Tags (39961003/39961003)        [####################] Done in 2:04
Mappings (2031885282/2031885282)[####################] Done in 53:36
Mappings (1205433/1205433)      [####################] Done in 19.84s
Mappings (45176/45176)          [####################] Done in 253ms
Tag parents (29235/29235)       [####################] Done in 156ms
Tag siblings (302637/302637)    [####################] Done in 1.59s
Importing options...            [####################] Done in 12ms
Resolving file relations...     [####################] Done in 103ms
Normalizing tags (6/6)          [####################] Done in 46:12:14
Warning! Detected 71 loops in tag siblings!
Warning! Detected 46 loops in tag parents!
Resolving tag parents...        [####################] Done in 36:40:41
Indexing (24/24)                [####################] Done in 3:25:06
Resolving tag siblings...       [--------------------] Unable to initialize DB!
error: duplicate key value violates unique constraint "tag_postids_tagid_idx"
    at Parser.parseErrorMessage (/repos/hydribooru/node_modules/pg-protocol/src/parser.ts:369:69)
    at Parser.handlePacket (/repos/hydribooru/node_modules/pg-protocol/src/parser.ts:188:21)
    at Parser.parse (/repos/hydribooru/node_modules/pg-protocol/src/parser.ts:103:30)
    at Socket.<anonymous> (/repos/hydribooru/node_modules/pg-protocol/src/index.ts:7:48)
    at Socket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:376:12)
    at readableAddChunk (node:internal/streams/readable:349:9)
    at Socket.Readable.push (node:internal/streams/readable:286:10)
    at Pipe.onStreamRead (node:internal/stream_base_commons:190:23) {
  length: 215,
  severity: 'ERROR',
  code: '23505',
  detail: 'Key (tagid)=(X) already exists.',
  hint: undefined,
  position: undefined,
  internalPosition: undefined,
  internalQuery: undefined,
  where: undefined,
  schema: 'public',
  table: 'tag_postids',
  column: undefined,
  dataType: undefined,
  constraint: 'tag_postids_tagid_idx',
  file: 'nbtinsert.c',
  line: '663',
  routine: '_bt_check_unique'
}

Update: said fix solved the issue and the database now rebuilds successfully. Maybe I try my hand at speeding up the rebuild process next :)

Log output

HyBooru started on port localhost:8080
Environment: production.
Press Ctrl-C to terminate.

Database update detected, rebuilding...

Rebuilding Database!

Posts (153553/153553) [####################] Done in 3.97s
Posts (3299/3299) [####################] Done in 56ms
Posts (2470/2470) [####################] Done in 72ms
Posts (628/628) [####################] Done in 38ms
Urls (601716/601716) [####################] Done in 2.55s
Notes [####################] Done in 0ms
Tags (39961003/39961003) [####################] Done in 2:26
Mappings (2031885282/2031885282)[####################] Done in 1:03:51
Mappings (1205433/1205433) [####################] Done in 27.63s
Mappings (45176/45176) [####################] Done in 311ms
Tag parents (29235/29235) [####################] Done in 166ms
Tag siblings (302637/302637) [####################] Done in 1.53s
Importing options... [####################] Done in 21ms
Resolving file relations... [####################] Done in 206ms
Normalizing tags (6/6) [####################] Done in 52:16:48
Warning! Detected 71 loops in tag siblings!
Warning! Detected 46 loops in tag parents!
Resolving tag parents... [####################] Done in 35:06:26
Indexing (24/24) [####################] Done in 3:08:08
Resolving tag siblings... [####################] Done in 1.24s
Counting tags usage... [####################] Done in 28.26s
Calculating statistics... [####################] Done in 9:18
Finalizing... [####################] Done in 357ms

Database rebuild completed in 91:57:05

Hi, thanks for your investigation.

However I can't accept this fix for now. This shouldn't happen and this fix only works around some other underlying issue. What applyTagSiblings does is it copies post lookup tables from better/preferred siblings to other siblings so they can be used for searching. The other siblings shouldn't have any lookup tables by this point because normalizeTagRelations is supposed to move all mappings to better siblings, leaving worse ones "empty". Maybe there is a possibility that a tag could have two better siblings, which I thought isn't allowed by hydrus but all kinds of shit happens there so it's better to improve hybooru tolerance to hydrus fuckery.

I will have to investigate this issue. It is possible that we might need to drop double-better siblings situations, or arguably, better, join lookup tables together on conflict. There is also possibility that the loop resolution in during tag normalization has some bug in it.

Could you try uploading your database(without images) for sake of testing? It could greatly help me debug this issue.

Ah, gotcha. I'm not at liberty to upload the hydrus db (it's also ~100gb). But if Hybooru has a verbose debug mode or crash dump / log-generating option, I'm more than willing to rebuild the database on another drive and send you the outputs.

Does that 100GB includes images or not?

As of hybooru, there is really nothing to log here, it just runs an SQL query and something goes wrong in the database. You can try to make a minimal reproduction for this bug or upload just the part of your database that is causing that issue, ie without caches, images and stuff. Otherwise, there is little I can do without any database to debug on.

If you need a place to upload to I can give you an account on my server so you can just scp/rsync/ftp into it.

That 100GB (~89GB) database does not include images or thumbnails, it's just that massive. It ballooned to that size after downloading the PTR. I'll see if I can reproduce it just rebuilding off the PTR and, if so, I'll upload the result.

Update: I've been assuming you want the client.mappings.db (~60GB), client.master.db (~20GB), client.caches.db (~6GB), and client.db. If you only want client.db, that's only 2GB. Or if I'm misunderstanding entirely, please let me know :)

I would need at least mappings and client.db, but all would be preferable. Otherwise, I will have to fake the rest of the database or something.

You can try with PTR but AFAIK there are other people using hybooru with large databases synced with PTR without this particular issue.

Should be fixed as of v1.10.0
Let me know if you are still experiencing this issue.