Strict DFS traversal
guanzo opened this issue · comments
The Trustless Gateway spec is being productionized by Saturn, Daghaus and possibly others. It's important for DAG traversal clients to be able to consume CARs with blocks in DFS order as this is optimal for streaming and incremental verification. While DFS isn't required by the spec, it seems to be the preferred traversal order as it's the only explicitly mentioned order, the other being "unknown".
This is the section that performs BFS, there may be more though.
I believe that section is doing DFS. See the bottom most part of it. Just collecting links but then serially walking those links and resolving what's below them.
https://github.com/web3-storage/freeway I believe is making use of this same code to produce DFS order for us. I think originally they had some BFS code internally that they had to unwind but from what I understand the unixfs internals aren't the problem area.
OK I really hope I'm not misunderstanding something.
Just collecting links
The issue is, blockstore.get
is called for each link, before resolving each links children.
Incremental verification fails on this CID that's fetched from lassie: bafybeigrbpmdsqaift2qwzy32bjyywkx6nzmn66pjeoaie6egpbktykc6e
If I log every blockstore.get
call, js-ipfs-unixfs will print
get CID(bafybeigrbpmdsqaift2qwzy32bjyywkx6nzmn66pjeoaie6egpbktykc6e)
get CID(bafybeiea2qi6j3jgqo5fu3lsski2iluovy6t7q5izs4vorjkfnkjfio5sa)
get CID(bafybeifsqgsqwjceoa6k5aszqqnwolcq65exbbpmem5wieynrid62tg3lm)
get CID(bafkreigops5bbgf35ah22lurd45kp7yzlpditmsfro3syxwwgovjr54fqi)
get CID(bafkreiez3s2vsggbjnzue3mczfrcmywpoy3xwnvwfesew34m4tzqzo2fs4)
...
If I use dagula, it prints the correct DFS order
get CID(bafybeigrbpmdsqaift2qwzy32bjyywkx6nzmn66pjeoaie6egpbktykc6e)
get CID(bafybeiea2qi6j3jgqo5fu3lsski2iluovy6t7q5izs4vorjkfnkjfio5sa)
get CID(bafkreigops5bbgf35ah22lurd45kp7yzlpditmsfro3syxwwgovjr54fqi)
get CID(bafkreiez3s2vsggbjnzue3mczfrcmywpoy3xwnvwfesew34m4tzqzo2fs4)
get CID(bafkreig2mplsficluccozctgebykmjcrbvb6ida5zccz2ww3zoionoeniy)
...
The DAG looks like
$ ipfs dag get bafybeigrbpmdsqaift2qwzy32bjyywkx6nzmn66pjeoaie6egpbktykc6e | jq
{
"Data": {
"/": {
"bytes": "CAIYmMXIGiCAgOAVIJjF6AQ"
}
},
"Links": [
{
"Hash": {
"/": "bafybeiea2qi6j3jgqo5fu3lsski2iluovy6t7q5izs4vorjkfnkjfio5sa"
},
"Name": "",
"Tsize": 45621766
},
{
"Hash": {
"/": "bafybeifsqgsqwjceoa6k5aszqqnwolcq65exbbpmem5wieynrid62tg3lm"
},
"Name": "",
"Tsize": 10103360
}
]
}
You can see that js-ipfs-unixfs traverses ...3lm
before ...5sa
's children.
Does calling blockstore.get count as a traversal?
EDIT: Actually it's not totally BFS, it's just the first layer of children being visited before any of the sub children
mmm, you're probably right about that! not sure about the async iteration going on there but probably doing all those blockstore fetches in parallel first; maybe worth finding out how freeway is using this code to do it? or perhaps it's just using js-unixfs for this side of it instead
Or ... perhaps we haven't noticed traversal order problems with freeway because large file fetching is more rare, maybe I need to go back and look at the error logs. You have to have a pretty big file to end up with multiple layers.
Yep I've been using this lib for a while without issue. It was only until I implemented incremental verification and tried rendering a 50MB image that I ran into this problem.
I made a hasty fix here due to time constraints: filecoin-saturn@ee5a574
The traversal is DFS, that is, the leaf node data is emitted depth-first, but internally the exporter applies an optimisation to load all the children in the DAG as soon as they are encountered.
The links are processed in-order and as soon as they are ready rather than waiting for the last to load before the first is processed.
This speeds up the case for when the blockstore has to go to the network or it uses some other slow retrieval method, as it has a headstart for when you do actually need a sibling block, but is why it's called in an order that looks like it's doing BFS.
A quick fix might be to expose a config option for the it-parallel
concurrency
parameter and pass it in here - if you set it to 1
then the blocks should be loaded from the blockstore in an order consistent with DFS.
@rvagg freeway does not use the unixfs exporter to create CARs with DFS block ordering.
Ok that makes sense.
For context, Saturn retrieval clients expect CAR file blocks to be in DFS order, and it's implemented by having the traversal client (in this case js-ipfs-unixfs) ask the blockstore for blocks in the expected order. There isn't really an easy workaround in userland since the blockstore lacks any traversal context, so a workaround would be appreciated 🙏 .