Update benchmarks for the new binary search
TheIronBorn opened this issue · comments
Now that the new binary search is in stable, the benchmarks ought to be updated.
Here's mine for ref:
- 2.3 GHz Intel Core i5 - Sandy Bridge with
rustc 1.28.0-nightly (952f344cd 2018-05-18)
$ RUSTFLAGS='-C target-cpu=native -C codegen-units=1 -C lto=thin' cargo +nightly bench --features nightly
name sorted ns/iter this ns/iter diff ns/iter diff % speedup
-construction::u8::l1 27,505 31,583 4,078 14.83% x 0.87
-construction::u8::l1_dup 16,400 20,138 3,738 22.79% x 0.81
-construction::u8::l2 186,847 229,325 42,478 22.73% x 0.81
-construction::u8::l2_dup 144,427 182,914 38,487 26.65% x 0.79
-construction::u32::l1 29,591 33,731 4,140 13.99% x 0.88
-construction::u32::l1_dup 21,174 25,295 4,121 19.46% x 0.84
-construction::u32::l2 326,904 367,618 40,714 12.45% x 0.89
-construction::u32::l2_dup 242,108 282,522 40,414 16.69% x 0.86
-construction::usize::l1 30,384 34,256 3,872 12.74% x 0.89
-construction::usize::l1_dup 21,423 25,316 3,893 18.17% x 0.85
-construction::usize::l2 333,066 373,590 40,524 12.17% x 0.89
-construction::usize::l2_dup 243,388 283,978 40,590 16.68% x 0.86
+search::u8::l1 46 37 -9 -19.57% x 1.24
-search::u8::l1_dup 31 37 6 19.35% x 0.84
-search::u8::l2 44 58 14 31.82% x 0.76
-search::u8::l2_dup 31 56 25 80.65% x 0.55
-search::u8::l3 29 170 141 486.21% x 0.17
-search::u8::l3_dup 30 127 97 323.33% x 0.24
+search::u32::l1 66 37 -29 -43.94% x 1.78
+search::u32::l1_dup 41 37 -4 -9.76% x 1.11
+search::u32::l2 85 64 -21 -24.71% x 1.33
-search::u32::l2_dup 62 64 2 3.23% x 0.97
-search::u32::l3 180 380 200 111.11% x 0.47
-search::u32::l3_dup 156 381 225 144.23% x 0.41
+search::usize::l1 66 37 -29 -43.94% x 1.78
+search::usize::l1_dup 41 37 -4 -9.76% x 1.11
+search::usize::l2 87 67 -20 -22.99% x 1.30
-search::usize::l2_dup 62 77 15 24.19% x 0.81
-search::usize::l3 247 522 275 111.34% x 0.47
-search::usize::l3_dup 203 614 411 202.46% x 0.33
name btree ns/iter this ns/iter diff ns/iter diff % speedup
+construction::u8::l1 47,078 31,583 -15,495 -32.91% x 1.49
+construction::u8::l1_dup 31,396 20,138 -11,258 -35.86% x 1.56
+construction::u8::l2 433,740 229,325 -204,415 -47.13% x 1.89
+construction::u8::l2_dup 314,341 182,914 -131,427 -41.81% x 1.72
+construction::u32::l1 66,399 33,731 -32,668 -49.20% x 1.97
+construction::u32::l1_dup 38,460 25,295 -13,165 -34.23% x 1.52
+construction::u32::l2 884,324 367,618 -516,706 -58.43% x 2.41
+construction::u32::l2_dup 537,884 282,522 -255,362 -47.48% x 1.90
+construction::usize::l1 66,202 34,256 -31,946 -48.26% x 1.93
+construction::usize::l1_dup 39,174 25,316 -13,858 -35.38% x 1.55
+construction::usize::l2 898,311 373,590 -524,721 -58.41% x 2.40
+construction::usize::l2_dup 551,082 283,978 -267,104 -48.47% x 1.94
+search::u8::l1 48 37 -11 -22.92% x 1.30
-search::u8::l1_dup 35 37 2 5.71% x 0.95
-search::u8::l2 46 58 12 26.09% x 0.79
-search::u8::l2_dup 35 56 21 60.00% x 0.62
-search::u8::l3 36 170 134 372.22% x 0.21
-search::u8::l3_dup 34 127 93 273.53% x 0.27
+search::u32::l1 66 37 -29 -43.94% x 1.78
+search::u32::l1_dup 42 37 -5 -11.90% x 1.14
+search::u32::l2 91 64 -27 -29.67% x 1.42
-search::u32::l2_dup 60 64 4 6.67% x 0.94
-search::u32::l3 351 380 29 8.26% x 0.92
-search::u32::l3_dup 195 381 186 95.38% x 0.51
+search::usize::l1 66 37 -29 -43.94% x 1.78
+search::usize::l1_dup 42 37 -5 -11.90% x 1.14
+search::usize::l2 96 67 -29 -30.21% x 1.43
-search::usize::l2_dup 61 77 16 26.23% x 0.79
-search::usize::l3 441 522 81 18.37% x 0.84
-search::usize::l3_dup 241 614 373 154.77% x 0.39
Oh, that's very interesting.. I was running mine on an AMD 2600X, and got very different numbers (see the updated README
)... Didn't set lto or codegen-units 1 though. Did you find them to matter a lot in this case (I don't think they should...)?
Seems like it makes a big diff (Not a great example bc sorted
may have changed as well.)
$ RUSTFLAGS='-C target-cpu=native' cargo +nightly bench --features nightly this > only_target_cpu
Compiling ordsearch v0.2.2
Finished release [optimized] target(s) in 12.13s
Running target/release/deps/ordsearch-fa7482cf08da0422
$ cargo benchcmp bench.dat only_target_cpu
name bench.dat ns/iter only_target_cpu ns/iter diff ns/iter diff % speedup
-b::this::construction::u32::l1 33,731 48,522 14,791 43.85% x 0.70
-b::this::construction::u32::l1_dup 25,295 42,251 16,956 67.03% x 0.60
-b::this::construction::u32::l2 367,618 588,335 220,717 60.04% x 0.62
-b::this::construction::u32::l2_dup 282,522 421,783 139,261 49.29% x 0.67
-b::this::construction::u8::l1 31,583 46,853 15,270 48.35% x 0.67
-b::this::construction::u8::l1_dup 20,138 30,245 10,107 50.19% x 0.67
-b::this::construction::u8::l2 229,325 365,651 136,326 59.45% x 0.63
-b::this::construction::u8::l2_dup 182,914 270,626 87,712 47.95% x 0.68
-b::this::construction::usize::l1 34,256 42,591 8,335 24.33% x 0.80
-b::this::construction::usize::l1_dup 25,316 32,873 7,557 29.85% x 0.77
-b::this::construction::usize::l2 373,590 461,806 88,216 23.61% x 0.81
-b::this::construction::usize::l2_dup 283,978 410,118 126,140 44.42% x 0.69
-b::this::search::u32::l1 37 77 40 108.11% x 0.48
-b::this::search::u32::l1_dup 37 55 18 48.65% x 0.67
-b::this::search::u32::l2 64 100 36 56.25% x 0.64
-b::this::search::u32::l2_dup 64 87 23 35.94% x 0.74
-b::this::search::u32::l3 380 527 147 38.68% x 0.72
-b::this::search::u32::l3_dup 381 504 123 32.28% x 0.76
-b::this::search::u8::l1 37 60 23 62.16% x 0.62
-b::this::search::u8::l1_dup 37 46 9 24.32% x 0.80
-b::this::search::u8::l2 58 77 19 32.76% x 0.75
+b::this::search::u8::l2_dup 56 50 -6 -10.71% x 1.12
-b::this::search::u8::l3 170 176 6 3.53% x 0.97
+b::this::search::u8::l3_dup 127 69 -58 -45.67% x 1.84
-b::this::search::usize::l1 37 77 40 108.11% x 0.48
-b::this::search::usize::l1_dup 37 63 26 70.27% x 0.59
-b::this::search::usize::l2 67 109 42 62.69% x 0.61
-b::this::search::usize::l2_dup 77 113 36 46.75% x 0.68
-b::this::search::usize::l3 522 1,053 531 101.72% x 0.50
-b::this::search::usize::l3_dup 614 709 95 15.47% x 0.87
Proper comparison:
$ RUSTFLAGS='-C target-cpu=native' cargo +nightly bench --features nightly > only_target_cpu
name sorted_only_target ns/iter this_only_target ns/iter diff ns/iter diff % speedup
-construction::u32::l1 35,543 42,240 6,697 18.84% x 0.84
-construction::u32::l1_dup 27,242 32,395 5,153 18.92% x 0.84
-construction::u32::l2 400,881 464,190 63,309 15.79% x 0.86
-construction::u32::l2_dup 316,300 367,845 51,545 16.30% x 0.86
-construction::u8::l1 32,853 39,299 6,446 19.62% x 0.84
-construction::u8::l1_dup 20,571 29,485 8,914 43.33% x 0.70
-construction::u8::l2 235,016 299,221 64,205 27.32% x 0.79
-construction::u8::l2_dup 211,679 271,658 59,979 28.33% x 0.78
-construction::usize::l1 36,596 41,660 5,064 13.84% x 0.88
-construction::usize::l1_dup 30,698 32,598 1,900 6.19% x 0.94
-construction::usize::l2 407,180 480,711 73,531 18.06% x 0.85
-construction::usize::l2_dup 312,361 371,245 58,884 18.85% x 0.84
-search::u32::l1 76 77 1 1.32% x 0.99
+search::u32::l1_dup 63 55 -8 -12.70% x 1.15
-search::u32::l2 97 100 3 3.09% x 0.97
+search::u32::l2_dup 104 87 -17 -16.35% x 1.20
-search::u32::l3 242 525 283 116.94% x 0.46
-search::u32::l3_dup 254 505 251 98.82% x 0.50
+search::u8::l1 70 60 -10 -14.29% x 1.17
-search::u8::l1_dup 45 46 1 2.22% x 0.98
+search::u8::l2 79 77 -2 -2.53% x 1.03
+search::u8::l2_dup 52 51 -1 -1.92% x 1.02
-search::u8::l3 81 155 74 91.36% x 0.52
-search::u8::l3_dup 59 70 11 18.64% x 0.84
-search::usize::l1 74 77 3 4.05% x 0.96
+search::usize::l1_dup 62 55 -7 -11.29% x 1.13
-search::usize::l2 98 103 5 5.10% x 0.95
+search::usize::l2_dup 98 89 -9 -9.18% x 1.10
-search::usize::l3 348 591 243 69.83% x 0.59
-search::usize::l3_dup 344 635 291 84.59% x 0.54
Taking speedup
for each and applying lto_codegen / only_target
:
name only_target lto_codegen speedup relative to sorted
+construction::u8::l1 0.7 0.87 x 1.24
+construction::u8::l1_dup 0.6 0.81 x 1.35
+construction::u8::l2 0.62 0.81 x 1.31
+construction::u8::l2_dup 0.67 0.79 x 1.18
+construction::u32::l1 0.67 0.88 x 1.31
+construction::u32::l1_dup 0.67 0.84 x 1.25
+construction::u32::l2 0.63 0.89 x 1.41
+construction::u32::l2_dup 0.68 0.86 x 1.26
+construction::usize::l1 0.8 0.89 x 1.11
+construction::usize::l1_dup 0.77 0.85 x 1.10
+construction::usize::l2 0.81 0.89 x 1.10
+construction::usize::l2_dup 0.69 0.86 x 1.25
+search::u8::l1 0.48 1.24 x 2.58
+search::u8::l1_dup 0.67 0.84 x 1.25
+search::u8::l2 0.64 0.76 x 1.19
-search::u8::l2_dup 0.74 0.55 x 0.74
-search::u8::l3 0.72 0.17 x 0.24
-search::u8::l3_dup 0.76 0.24 x 0.32
+search::u32::l1 0.62 1.78 x 2.87
+search::u32::l1_dup 0.8 1.11 x 1.39
+search::u32::l2 0.75 1.33 x 1.77
-search::u32::l2_dup 1.12 0.97 x 0.87
-search::u32::l3 0.97 0.47 x 0.48
-search::u32::l3_dup 1.84 0.41 x 0.22
+search::usize::l1 0.48 1.78 x 3.71
+search::usize::l1_dup 0.59 1.11 x 1.88
+search::usize::l2 0.61 1.3 x 2.13
+search::usize::l2_dup 0.68 0.81 x 1.19
-search::usize::l3 0.5 0.47 x 0.94
-search::usize::l3_dup 0.87 0.33 x 0.38
Seems like lto and codegen-units 1 make a difference
codegen-units=1
alone doesn't seem to make up for it
name sorted_target_codegen ns/iter this_target_codegen ns/iter diff ns/iter diff % speedup
+construction::u32::l1 42,023 41,649 -374 -0.89% x 1.01
-construction::u32::l1_dup 27,217 31,767 4,550 16.72% x 0.86
-construction::u32::l2 435,104 457,158 22,054 5.07% x 0.95
-construction::u32::l2_dup 311,043 371,604 60,561 19.47% x 0.84
-construction::u8::l1 32,772 38,262 5,490 16.75% x 0.86
-construction::u8::l1_dup 20,170 25,119 4,949 24.54% x 0.80
-construction::u8::l2 239,203 293,715 54,512 22.79% x 0.81
-construction::u8::l2_dup 198,701 255,285 56,584 28.48% x 0.78
-construction::usize::l1 35,766 41,766 6,000 16.78% x 0.86
-construction::usize::l1_dup 26,466 32,727 6,261 23.66% x 0.81
-construction::usize::l2 407,139 537,220 130,081 31.95% x 0.76
-construction::usize::l2_dup 317,323 377,680 60,357 19.02% x 0.84
-search::u32::l1 73 77 4 5.48% x 0.95
+search::u32::l1_dup 61 56 -5 -8.20% x 1.09
-search::u32::l2 95 99 4 4.21% x 0.96
+search::u32::l2_dup 98 87 -11 -11.22% x 1.13
-search::u32::l3 213 472 259 121.60% x 0.45
-search::u32::l3_dup 214 453 239 111.68% x 0.47
-search::u8::l1 66 76 10 15.15% x 0.87
-search::u8::l1_dup 45 46 1 2.22% x 0.98
-search::u8::l2 73 95 22 30.14% x 0.77
+search::u8::l2_dup 53 51 -2 -3.77% x 1.04
-search::u8::l3 61 168 107 175.41% x 0.36
-search::u8::l3_dup 59 73 14 23.73% x 0.81
-search::usize::l1 75 94 19 25.33% x 0.80
+search::usize::l1_dup 61 58 -3 -4.92% x 1.05
-search::usize::l2 98 106 8 8.16% x 0.92
-search::usize::l2_dup 96 108 12 12.50% x 0.89
-search::usize::l3 324 642 318 98.15% x 0.50
-search::usize::l3_dup 299 629 330 110.37% x 0.48
Indeed, seems like lto=thin
is what makes the difference:
name sorted_target_thinlto ns/iter this_target_thinlto ns/iter diff ns/iter diff % speedup
-construction::u32::l1 36,769 57,720 20,951 56.98% x 0.64
-construction::u32::l1_dup 26,887 40,277 13,390 49.80% x 0.67
-construction::u32::l2 491,083 562,150 71,067 14.47% x 0.87
-construction::u32::l2_dup 363,382 574,514 211,132 58.10% x 0.63
-construction::u8::l1 40,745 42,283 1,538 3.77% x 0.96
-construction::u8::l1_dup 22,958 29,890 6,932 30.19% x 0.77
-construction::u8::l2 279,075 351,648 72,573 26.00% x 0.79
-construction::u8::l2_dup 208,916 302,384 93,468 44.74% x 0.69
+construction::usize::l1 44,409 43,570 -839 -1.89% x 1.02
-construction::usize::l1_dup 32,624 34,544 1,920 5.89% x 0.94
-construction::usize::l2 413,165 694,884 281,719 68.19% x 0.59
-construction::usize::l2_dup 364,902 468,816 103,914 28.48% x 0.78
-search::u32::l1 86 103 17 19.77% x 0.83
+search::u32::l1_dup 73 66 -7 -9.59% x 1.11
+search::u32::l2 113 103 -10 -8.85% x 1.10
-search::u32::l2_dup 98 99 1 1.02% x 0.99
-search::u32::l3 299 540 241 80.60% x 0.55
+search::u32::l3_dup 721 546 -175 -24.27% x 1.32
+search::u8::l1 87 62 -25 -28.74% x 1.40
+search::u8::l1_dup 60 57 -3 -5.00% x 1.05
+search::u8::l2 88 81 -7 -7.95% x 1.09
+search::u8::l2_dup 55 55 0 0.00% x 1.00
-search::u8::l3 67 196 129 192.54% x 0.34
-search::u8::l3_dup 62 80 18 29.03% x 0.78
+search::usize::l1 90 82 -8 -8.89% x 1.10
+search::usize::l1_dup 75 56 -19 -25.33% x 1.34
+search::usize::l2 109 106 -3 -2.75% x 1.03
+search::usize::l2_dup 103 90 -13 -12.62% x 1.14
-search::usize::l3 355 663 308 86.76% x 0.54
-search::usize::l3_dup 361 906 545 150.97% x 0.40