klauspost / reedsolomon

Reed-Solomon Erasure Coding in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can the number of parity shards more than the number of data shards?

urwork opened this issue · comments

The readme said

A good starting point is above 5 and below 257 data shards (the maximum supported number), and the number of parity shards to be 2 or above, and below the number of data shards.

Why "the number of parity shards to be below the number of data shards."?
Can the number of parity shards more than the number of data shards?

I see sia.tech can split data in to 30 segments, and recover from any 10 of 30 segments , can reedsolomon do this ?

https://sia.tech/technology

The Sia software divides files into 30 segments before uploading, each targeted for distribution to hosts across the world. This distribution assures that no one host represents a single point of failure and reinforces overall network uptime and redundancy.

File segments are created using a technology called Reed-Solomon erasure coding, commonly used in CDs and DVDs. Erasure coding allows Sia to divide files in a redundant manner, where any 10 of 30 segments can fully recover a user's files.

This means that if 20 out of 30 hosts go offline, a Sia user is still able to download her files.

Why "the number of parity shards to be below the number of data shards."?

The recommendation was likely made in relation to performance, as the time required grows quadratically with the number of parity shards used. And most people only want a percentage of overhead that is less than 100% if it's for error recovery purposes.

Can the number of parity shards more than the number of data shards?

The only restriction is that the data + parity shard count must not exceed 256 in total, as reedsolomon uses Galois field (2^8).

I see sia.tech can split data in to 30 segments, and recover from any 10 of 30 segments , can reedsolomon do this ?

You can emulate the behaviour of the Sia software by picking data shard count of 10 and parity shard count of 20, and distribute each shard to a unique host. Picking parity shard count of 20 means you can lose up to 20 shards of the 30 shards, which is same as being able to recover from any 10 of the shards.

Note that Sia could be using a more complex scheme to achieve the same behaviour, but the above arrangement is the most straightforward way of doing so.