filecoin-project / lassie

A minimal universal retrieval client library for IPFS and Filecoin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicates output not producing enough duplicates blocks

rvagg opened this issue · comments

Noticed while working on #444 and I'm not sure if the same effect applies to the daemon which IIRC is doing dups by default.

$ fixtureplate generate 'dir(5*file:1MiB{zero})'
A directory containing:
  → 5 files of 1.0 MiB containing just zeros
Wrote to bafybeidyazlr557ariavujsypg3h4nv3dqtgc3dfvfhbdfd567vmihur3e.car
$ fixtureplate explain bafybeidyazlr557ariavujsypg3h4nv3dqtgc3dfvfhbdfd567vmihur3e.car | wc
     32     252    3487

(has a header line, so should be 31)

$ go run ./cmd/lassie/ fetch --dups bafybeidyazlr557ariavujsypg3h4nv3dqtgc3dfvfhbdfd567vmihur3e 
$ car ls bafybeidyazlr557ariavujsypg3h4nv3dqtgc3dfvfhbdfd567vmihur3e.car | wc
      9       9     540

there are duplicates in there, just not enough - in this case there's a bunch of files with the same duplicate middle blocks but only the first file gets its duplicates.

$ fixtureplate generate 'dir(file:100MiB{zero})'
A directory containing:
  → A file of 100 MiB containing just zeros
Wrote to bafybeicvhzbrqzskficgeombk3j6usi6c5zazrxekgnplg2lfmxdapbgiu.car

This file should get 415, but we only get a fraction of them. In this case, the sharded file is sharded in tiers and I think maybe we only get the first tier?

I believe our duplicate-adder algorithm only adds duplicate blocks where they appear in sequence, but in both of these cases (and what will be seen commonly in the wild), the duplicates will be scattered across a DAG; they're only in sequence in our tests because we do it with files containing zeros and the files aren't big enough to make sub-shards. I think.

addressed in #444, had to wire some more bits up to get it to work properly