Upgrading does not work

Question

Upgrading does not work

winks opened this issue 2 years ago · comments

Florian Anderiasch commented 2 years ago

I tried to upgrade from a dist version from ~2022-04-26 to 2.0.0-5 and there seem to be some db migrations missing or not documented well:

Unfortunately so far the only message I got was:

Exception while running rules: more than one row (DB::Error)
  from ???
  from ???
  from ???
  from ???
  from ???
  from ???
  from ???
  from ???
  from src/env/__libc_start_main.c:94:2 in 'libc_start_main_stage2'

but maybe the db changes are small enough to just manually do the migration, I'll look further into the code.

Todd Sundsted · Answer 1 · Mon Dec 05 2022 21:49:15 GMT+0800 (China Standard Time)

@winks you might be running into #45 (comment)

if so, did the migration and server startup continue after the error (it should have)? i made modifications after the first error report to log the error and continue. it's in a migration that rebuilds the timeline and notifications (vs. changing the schema of the database). how many of these errors did you get?

Florian Anderiasch · Answer 2 · Tue Dec 06 2022 01:19:54 GMT+0800 (China Standard Time)

I didn't count, but 20ish for sure.

Meanwhile I rebuilt without --no-debug, like crystal build src/ktistec/server.cr --static and after running that twice it seems to have fixed itself (third or fourth try overall), so maybe it was indeed fixing itself, but slowly.

Sorry for the noise, I'll keep an eye on it - but right now it looks ok.

Todd Sundsted · Answer 3 · Tue Dec 06 2022 01:35:08 GMT+0800 (China Standard Time)

thanks! did you get the same number of errors each time, or did it vary?

Florian Anderiasch · Answer 4 · Tue Dec 06 2022 01:46:23 GMT+0800 (China Standard Time)

That's lost to the terminal history, sorry. I manually scrolled back and saw that it was the same error, then rolled back to the old version - didn't really count or save it, but maybe I can replay from backup and count later.

Florian Anderiasch · Answer 5 · Wed Dec 07 2022 20:43:36 GMT+0800 (China Standard Time)

Reran with the old db dump earlier,

$ grep Batch asdf | wc -l         
36
$ grep Exception asdf | sort | uniq -c
   2003 Exception while running rules: more than one row (DB::Error)

I was patient and at some point the log had:

ktistecdev_1  | Batch 36 complete
ktistecdev_1  | update-timeline-and-notifications: applied in 246.2214s
ktistecdev_1  | add-indexes-on-actor-iri-and-target-iri-to-activities: applied in 0.9925s
ktistecdev_1  | [development] Ktistec is ready to lead at http://0.0.0.0:3000

and it went just fine. So I guess the main problem is the stack trace, and that there are so many of them so a user who doesn't look at the log in detail fails, like I did.

Todd Sundsted · Answer 6 · Fri Dec 16 2022 21:07:43 GMT+0800 (China Standard Time)

@winks see this issue before you upgrade: #55

you might also have duplicate rows in your actors and objects table—in fact, given this error i suspect you do. recent changes add a uniqueness constraint in the database to prevent this, but you will need to explicitly delete the duplicate before you can apply it.

JayVii · Answer 7 · Sun Dec 18 2022 18:54:09 GMT+0800 (China Standard Time)

FYI, I had the same issue with 2.0.0-6 (not with 2.0.0-5 for some reason, though). The info from #55 (comment) cleared the issue and the docker-image started successfully afterwards.

Might be a good idea to have a periodic clean-up process in ktistec? Or is the risk of data-loss too high in that case?

Todd Sundsted · Answer 8 · Tue Dec 20 2022 07:01:53 GMT+0800 (China Standard Time)

that could always be managed with a backup. aside for cases like the one that led to the duplicates, which are going to cause problems (like it did), i'm trying to make the database resilient to garbage, if for no other reason than the fact that my database, which is about two years old now, probably is full of it!