klauspost / reedsolomon

Reed-Solomon Erasure Coding in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

simple-encode/decoder not working?

spikebike opened this issue · comments

I thought we discussed this before, but I looked through closed issues and didn't find it.

I ran the below on the current git version:

$ dd if=/dev/urandom of=2GB count=32768 bs=65536
32768+0 records in
32768+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 49.967 s, 43.0 MB/s
$ go run ../simple-encoder.go -data 8 -par 3 ./2GB
Opening ./2GB
File split into 11 data+parity shards with 268435456 bytes/shard.
Writing to 2GB.0
Writing to 2GB.1
Writing to 2GB.2
Writing to 2GB.3
Writing to 2GB.4
Writing to 2GB.5
Writing to 2GB.6
Writing to 2GB.7
Writing to 2GB.8
Writing to 2GB.9
Writing to 2GB.10
$ dd if=/dev/urandom of=2GB.2 count=256 conv=notrunc
256+0 records in
256+0 records out
131072 bytes (131 kB, 128 KiB) copied, 0.0150979 s, 8.7 MB/s
$ dd if=/dev/urandom of=2GB.4 count=256 conv=notrunc
256+0 records in
256+0 records out
131072 bytes (131 kB, 128 KiB) copied, 0.00384383 s, 34.1 MB/s
$ du 2GB*
2097156	2GB
262148	2GB.0
262148	2GB.1
262148	2GB.10
262148	2GB.2
262148	2GB.3
262148	2GB.4
262148	2GB.5
262148	2GB.6
262148	2GB.7
262148	2GB.8
$ go run ../simple-decoder.go -data 8 -par 3 ./2GB
Opening ./2GB.0
Opening ./2GB.1
Opening ./2GB.2
Opening ./2GB.3
Opening ./2GB.4
Opening ./2GB.5
Opening ./2GB.6
Opening ./2GB.7
Opening ./2GB.8
Opening ./2GB.9
Opening ./2GB.10
Verification failed. Reconstructing data
Verification failed after reconstruction, data likely corrupted.
exit status 1

I believe we previously discussed some wrapper format to make sure the pieces were ordered, detecting the wrong length, etc. But in this case there's plenty of redundancy left, the files are in the right order, and they are all at the original size.

I probably did something stupid, but I was surprised it didn't work.

Ah, looks like it works well with missing files, but not corrupted files. So a file format based on this Reed Solomon code should add a sha256sum or similar to each chunk, and then a decoder would discard any shards with an invalid checksum.

Think it's worth a pull request to add a sha256sum to each piece and then reject corrupt pieces?

@spikebike Feel free to send in a PR