benbjohnson / litestream

Streaming replication for SQLite.

Home Page:https://litestream.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash with invalid memory address or nil pointer dereference when restoring

jonicohn opened this issue · comments

Hi @benbjohnson,

first of all: Thank you for this project. This helps a lot for my databases!

I'm using it in different projects and databases and it works quite good.

But now I have an issue when restoring a database after it was working for months. The database file has 10 tables and the largest one has 11072 rows with 35 columns. The total size of the database file is ~ 1 MB. The restore crashes with the following error message:

./litestream restore -config litestream.yml /tmp/litestream/experiment.db

/tmp/litestream/experiment.db(s3): restoring snapshot 4b0d294741b7c9ad/00000000 to /tmp/litestream/experiment.db.tmp
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xed93f9]

goroutine 1 [running]:
github.com/benbjohnson/litestream/s3.(*ReplicaClient).SnapshotReader(0xc00005df00, {0x178bd50, 0xc000120000}, {0xc000515bac, 0x10}, 0x0?)
        /home/runner/work/litestream/litestream/s3/replica_client.go:295 +0x3b9
github.com/benbjohnson/litestream.(*Replica).restoreSnapshot(0xc00037e0f0, {0x178bd50, 0xc000120000}, {0xc000515bac, 0x10}, 0x7fffffff?, {0xc000759200, 0x21})
        /home/runner/work/litestream/litestream/replica.go:1308 +0x18d
github.com/benbjohnson/litestream.(*Replica).Restore(0xc00037e0f0, {0x178bd50, 0xc000120000}, {{0x7ffd1a68bf29, 0x1d}, {0x0, 0x0}, {0xc000515bac, 0x10}, 0x7fffffff, ...})
        /home/runner/work/litestream/litestream/replica.go:1074 +0xae5
main.(*RestoreCommand).Run(0x20a8d18, {0x178bd50, 0xc000120000}, {0xc00012e020, 0x3, 0x3})
        /home/runner/work/litestream/litestream/cmd/litestream/restore.go:90 +0x971
main.(*Main).Run(0xc0000061a0?, {0x178bd50, 0xc000120000}, {0xc00012e010, 0x4, 0x4})
        /home/runner/work/litestream/litestream/cmd/litestream/main.go:123 +0x165
main.main()
        /home/runner/work/litestream/litestream/cmd/litestream/main.go:43 +0x7c

If I comment the lines 294 and 295 and compile the code:

out, err := c.s3.GetObjectWithContext(ctx, &s3.GetObjectInput{
Bucket: aws.String(c.Bucket),
Key: aws.String(key),
})
if isNotExists(err) {
return nil, os.ErrNotExist
} else if err != nil {
return nil, err
}
internal.OperationTotalCounterVec.WithLabelValues(ReplicaClientType, "GET").Inc()
internal.OperationBytesCounterVec.WithLabelValues(ReplicaClientType, "GET").Add(float64(*out.ContentLength))
return out.Body, nil
}

the restore is working again. If I instead remove 52 rows of my table or remove 1 column it is working too.

When I download the snapshot file manually and decompress it, it is working too.

I don't really know go language and I'm not sure what these two lines are needed for. When I understand it correctly they are used for some metrics, but I'm not sure if it is safe to remove these two lines, although it would be better to fix the root cause.

Does anybody know why this happens?

Thank you!

Hi!

Does this only happen with a specific release or a specific database and is it 100% reproducible?

Did you look up the lines of the same release you're on on GitHub as those have changed a little since last release?

Thanks!

Ah, I see what the issue is now that I'm at my laptop. It's indeed an expectation that a successful request to the bucket returns the content length of the object (snapshot in this case) and it's recorded in metrics. For some reason the length of the object is not given by the AWS SDK.

Is that a standard AWS S3 bucket you are using or some third party provider? And which provider if you can say.

Thanks.

Opened #557 that fixes this and another issue that is related to the same thing which we just hit earlier.

Sorry for the late follow-up:

In this case it is a S3-like API from Palantir Foundry.

Ah, yeah, so their implementation doesn't seem to send that information. Regardless it should not cause issues so the merged fix should help.

If you're able to build Litestream from source if you could try the latest code from main it should work.

Already built it today and it works. Thank you!