About the inversionTree root

Question

About the inversionTree root

DurantVivado opened this issue 2 years ago · comments

Dear developers,
I think it's better to use the pointer as the root. That is :
from (inversion_tree.go:18)
root inversionNode
to
root *inversionNode

from (inversion_tree.go:32)
root: inversionNode{ matrix: identity, children: make([]*inversionNode, dataShards+parityShards), },
to
root: &inversionNode{ matrix: identity, children: make([]*inversionNode, dataShards+parityShards), },

That improves quite a lot in my tests.
Thanks a lot for any of your reply. I sincerely appreciate it.

Klaus Post · Answer 1 · Fri Jan 07 2022 16:59:10 GMT+0800 (China Standard Time)

@DurantVivado Could you share your tests?

I don't understand how this could give any significant change.

DurantThorvalds · Answer 2 · Fri Jan 07 2022 17:11:17 GMT+0800 (China Standard Time)

Thanks a lot for replying!😂 Truly appreciation and respect.

In my own testbed
Test File: erasure_encode_read_test.go
Test Benchmark: BenchmarkEncodeDecode20x4x24x4096x20Mx2fault

Before Change:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkEncodeDecode20x4x24x4096x20Mx2fault$ github.com/DurantVivado/Grasure -v -cpuprofile profiles/cpu.profile -memprofile profiles/mem.profile -blockprofile profiles/blk.profile -benchtime 10x

goos: linux
goarch: amd64
pkg: github.com/DurantVivado/Grasure
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault-2   	      10	 585994617 ns/op	  35.79 MB/s	52339386 B/op	   61678 allocs/op
PASS
ok  	github.com/DurantVivado/Grasure	11.376s

After Change:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkEncodeDecode20x4x24x4096x20Mx2fault$ github.com/DurantVivado/Grasure -v -cpuprofile profiles/cpu.profile -memprofile profiles/mem.profile -blockprofile profiles/blk.profile -benchtime 10x

goos: linux
goarch: amd64
pkg: github.com/DurantVivado/Grasure
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault
BenchmarkEncodeDecode20x4x24x4096x20Mx2fault-2   	      10	 281809520 ns/op	  74.42 MB/s	52341989 B/op	   61643 allocs/op
PASS
ok  	github.com/DurantVivado/Grasure	3.363s

The improval is considerably explicit.

Klaus Post · Answer 3 · Fri Jan 07 2022 17:43:54 GMT+0800 (China Standard Time)

@DurantVivado Are you sure this isn't just randomness in the tests? You are reading and writing files which brings much randomness into the benchmark.

I tried running the benchmark, but it fails. Maybe because I use Windows.

BenchmarkReconstructData50x20x1M should test the inversion tree cache, and I see no measurable change, even if changing shards to 1K. In fact, disabling the inversion cache does very little to performance.

So I think your benchmarks is showing something else.

inversionTree is only used as a pointer, and whenever root is accessed it is via pointer methods. So I don't see how it would make any difference whether it is a value or a pointer.

DurantThorvalds · Answer 4 · Fri Jan 07 2022 17:58:14 GMT+0800 (China Standard Time)

I guess I miss some points somehow. The files I generate are different.
And for tests in BenchmarkReconstructData10x2x10000. It seems to improve actually not much.(about 10%)

Before

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReconstructData10x2x10000$ github.com/DurantVivado/reedsolomon -v -benchtime=10x

goos: linux
goarch: amd64
pkg: github.com/DurantVivado/reedsolomon
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkReconstructData10x2x10000
BenchmarkReconstructData10x2x10000-2          10              8348 ns/op        14373.84 MB/s       1882 B/op         43 allocs/op
PASS
ok      github.com/DurantVivado/reedsolomon     0.012s

After

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReconstructData10x2x10000$ github.com/DurantVivado/reedsolomon -v -benchtime=10x

goos: linux
goarch: amd64
pkg: github.com/DurantVivado/reedsolomon
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkReconstructData10x2x10000
BenchmarkReconstructData10x2x10000-2          10              7590 ns/op        15810.90 MB/s       1882 B/op         43 allocs/op
PASS
ok      github.com/DurantVivado/reedsolomon     0.015s

I guess I am rookie in go. And need more practices.😂
Anyway, thanks a lot for your assistance!

Klaus Post · Answer 5 · Fri Jan 07 2022 18:29:08 GMT+0800 (China Standard Time)

With -benchtime=10x you are basically getting random numbers. Running with default, 1s:

Before, 5 independent runs:
BenchmarkReconstructData10x2x10000-32    	  373849	      3058 ns/op	39240.21 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  354892	      3067 ns/op	39131.69 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  382095	      3064 ns/op	39159.23 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  383332	      3054 ns/op	39294.97 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  376168	      3059 ns/op	39223.83 MB/s	     456 B/op	       6 allocs/op

With pointers, 5 runs:
BenchmarkReconstructData10x2x10000-32    	  375007	      3058 ns/op	39239.20 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  376861	      3049 ns/op	39354.31 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  374106	      3058 ns/op	39240.03 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  375428	      3068 ns/op	39112.56 MB/s	     456 B/op	       6 allocs/op
BenchmarkReconstructData10x2x10000-32    	  378624	      3062 ns/op	39193.64 MB/s	     456 B/op	       6 allocs/op

Differences seems to be within run-to-run variance to me.