About the inversionTree root
DurantVivado opened this issue · comments
Dear developers,
I think it's better to use the pointer as the root. That is :
from (inversion_tree.go:18)
root inversionNode
to
root *inversionNode
from (inversion_tree.go:32)
root: inversionNode{ matrix: identity, children: make([]*inversionNode, dataShards+parityShards), },
to
root: &inversionNode{ matrix: identity, children: make([]*inversionNode, dataShards+parityShards), },
That improves quite a lot in my tests.
Thanks a lot for any of your reply. I sincerely appreciate it.
@DurantVivado Could you share your tests?
I don't understand how this could give any significant change.
Thanks a lot for replying!😂 Truly appreciation and respect.
In my own testbed
Test File: erasure_encode_read_test.go
Test Benchmark: BenchmarkEncodeDecode20x4x24x4096x20Mx2fault
- Before Change:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkEncodeDecode20x4x24x4096x20Mx2fault$ github.com/DurantVivado/Grasure -v -cpuprofile profiles/cpu.profile -memprofile profiles/mem.profile -blockprofile profiles/blk.profile -benchtime 10x
goos: linux
goarch: amd64
pkg: github.com/DurantVivado/Grasure
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault-2 10 585994617 ns/op 35.79 MB/s 52339386 B/op 61678 allocs/op
PASS
ok github.com/DurantVivado/Grasure 11.376s
- After Change:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkEncodeDecode20x4x24x4096x20Mx2fault$ github.com/DurantVivado/Grasure -v -cpuprofile profiles/cpu.profile -memprofile profiles/mem.profile -blockprofile profiles/blk.profile -benchtime 10x
goos: linux
goarch: amd64
pkg: github.com/DurantVivado/Grasure
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault-2 10 281809520 ns/op 74.42 MB/s 52341989 B/op 61643 allocs/op
PASS
ok github.com/DurantVivado/Grasure 3.363s
The improval is considerably explicit.
@DurantVivado Are you sure this isn't just randomness in the tests? You are reading and writing files which brings much randomness into the benchmark.
I tried running the benchmark, but it fails. Maybe because I use Windows.
BenchmarkReconstructData50x20x1M
should test the inversion tree cache, and I see no measurable change, even if changing shards to 1K. In fact, disabling the inversion cache does very little to performance.
So I think your benchmarks is showing something else.
inversionTree
is only used as a pointer, and whenever root
is accessed it is via pointer methods. So I don't see how it would make any difference whether it is a value or a pointer.
I guess I miss some points somehow. The files I generate are different.
And for tests in BenchmarkReconstructData10x2x10000
. It seems to improve actually not much.(about 10%)
- Before
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReconstructData10x2x10000$ github.com/DurantVivado/reedsolomon -v -benchtime=10x
goos: linux
goarch: amd64
pkg: github.com/DurantVivado/reedsolomon
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkReconstructData10x2x10000
BenchmarkReconstructData10x2x10000-2 10 8348 ns/op 14373.84 MB/s 1882 B/op 43 allocs/op
PASS
ok github.com/DurantVivado/reedsolomon 0.012s
- After
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReconstructData10x2x10000$ github.com/DurantVivado/reedsolomon -v -benchtime=10x
goos: linux
goarch: amd64
pkg: github.com/DurantVivado/reedsolomon
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkReconstructData10x2x10000
BenchmarkReconstructData10x2x10000-2 10 7590 ns/op 15810.90 MB/s 1882 B/op 43 allocs/op
PASS
ok github.com/DurantVivado/reedsolomon 0.015s
I guess I am rookie in go. And need more practices.😂
Anyway, thanks a lot for your assistance!
With -benchtime=10x
you are basically getting random numbers. Running with default, 1s:
Before, 5 independent runs:
BenchmarkReconstructData10x2x10000-32 373849 3058 ns/op 39240.21 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 354892 3067 ns/op 39131.69 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 382095 3064 ns/op 39159.23 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 383332 3054 ns/op 39294.97 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 376168 3059 ns/op 39223.83 MB/s 456 B/op 6 allocs/op
With pointers, 5 runs:
BenchmarkReconstructData10x2x10000-32 375007 3058 ns/op 39239.20 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 376861 3049 ns/op 39354.31 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 374106 3058 ns/op 39240.03 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 375428 3068 ns/op 39112.56 MB/s 456 B/op 6 allocs/op
BenchmarkReconstructData10x2x10000-32 378624 3062 ns/op 39193.64 MB/s 456 B/op 6 allocs/op
Differences seems to be within run-to-run variance to me.