jonhoo / ordsearch

A Rust data structure for efficient lower-bound lookups

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update benchmarks for the new binary search

TheIronBorn opened this issue · comments

Now that the new binary search is in stable, the benchmarks ought to be updated.

Here's mine for ref:

  • 2.3 GHz Intel Core i5 - Sandy Bridge with rustc 1.28.0-nightly (952f344cd 2018-05-18)
  • $ RUSTFLAGS='-C target-cpu=native -C codegen-units=1 -C lto=thin' cargo +nightly bench --features nightly
 name                         sorted ns/iter  this ns/iter  diff ns/iter   diff %  speedup 
-construction::u8::l1         27,505          31,583               4,078   14.83%   x 0.87 
-construction::u8::l1_dup     16,400          20,138               3,738   22.79%   x 0.81 
-construction::u8::l2         186,847         229,325             42,478   22.73%   x 0.81 
-construction::u8::l2_dup     144,427         182,914             38,487   26.65%   x 0.79 
-construction::u32::l1        29,591          33,731               4,140   13.99%   x 0.88 
-construction::u32::l1_dup    21,174          25,295               4,121   19.46%   x 0.84 
-construction::u32::l2        326,904         367,618             40,714   12.45%   x 0.89 
-construction::u32::l2_dup    242,108         282,522             40,414   16.69%   x 0.86 
-construction::usize::l1      30,384          34,256               3,872   12.74%   x 0.89 
-construction::usize::l1_dup  21,423          25,316               3,893   18.17%   x 0.85 
-construction::usize::l2      333,066         373,590             40,524   12.17%   x 0.89 
-construction::usize::l2_dup  243,388         283,978             40,590   16.68%   x 0.86 
+search::u8::l1               46              37                      -9  -19.57%   x 1.24 
-search::u8::l1_dup           31              37                       6   19.35%   x 0.84 
-search::u8::l2               44              58                      14   31.82%   x 0.76 
-search::u8::l2_dup           31              56                      25   80.65%   x 0.55 
-search::u8::l3               29              170                    141  486.21%   x 0.17 
-search::u8::l3_dup           30              127                     97  323.33%   x 0.24 
+search::u32::l1              66              37                     -29  -43.94%   x 1.78 
+search::u32::l1_dup          41              37                      -4   -9.76%   x 1.11 
+search::u32::l2              85              64                     -21  -24.71%   x 1.33 
-search::u32::l2_dup          62              64                       2    3.23%   x 0.97 
-search::u32::l3              180             380                    200  111.11%   x 0.47 
-search::u32::l3_dup          156             381                    225  144.23%   x 0.41 
+search::usize::l1            66              37                     -29  -43.94%   x 1.78 
+search::usize::l1_dup        41              37                      -4   -9.76%   x 1.11 
+search::usize::l2            87              67                     -20  -22.99%   x 1.30 
-search::usize::l2_dup        62              77                      15   24.19%   x 0.81 
-search::usize::l3            247             522                    275  111.34%   x 0.47 
-search::usize::l3_dup        203             614                    411  202.46%   x 0.33 
 name                         btree ns/iter  this ns/iter  diff ns/iter   diff %  speedup 
+construction::u8::l1         47,078         31,583             -15,495  -32.91%   x 1.49 
+construction::u8::l1_dup     31,396         20,138             -11,258  -35.86%   x 1.56 
+construction::u8::l2         433,740        229,325           -204,415  -47.13%   x 1.89 
+construction::u8::l2_dup     314,341        182,914           -131,427  -41.81%   x 1.72 
+construction::u32::l1        66,399         33,731             -32,668  -49.20%   x 1.97 
+construction::u32::l1_dup    38,460         25,295             -13,165  -34.23%   x 1.52 
+construction::u32::l2        884,324        367,618           -516,706  -58.43%   x 2.41 
+construction::u32::l2_dup    537,884        282,522           -255,362  -47.48%   x 1.90 
+construction::usize::l1      66,202         34,256             -31,946  -48.26%   x 1.93 
+construction::usize::l1_dup  39,174         25,316             -13,858  -35.38%   x 1.55 
+construction::usize::l2      898,311        373,590           -524,721  -58.41%   x 2.40 
+construction::usize::l2_dup  551,082        283,978           -267,104  -48.47%   x 1.94 
+search::u8::l1               48             37                     -11  -22.92%   x 1.30 
-search::u8::l1_dup           35             37                       2    5.71%   x 0.95 
-search::u8::l2               46             58                      12   26.09%   x 0.79 
-search::u8::l2_dup           35             56                      21   60.00%   x 0.62 
-search::u8::l3               36             170                    134  372.22%   x 0.21 
-search::u8::l3_dup           34             127                     93  273.53%   x 0.27 
+search::u32::l1              66             37                     -29  -43.94%   x 1.78 
+search::u32::l1_dup          42             37                      -5  -11.90%   x 1.14 
+search::u32::l2              91             64                     -27  -29.67%   x 1.42 
-search::u32::l2_dup          60             64                       4    6.67%   x 0.94 
-search::u32::l3              351            380                     29    8.26%   x 0.92 
-search::u32::l3_dup          195            381                    186   95.38%   x 0.51 
+search::usize::l1            66             37                     -29  -43.94%   x 1.78 
+search::usize::l1_dup        42             37                      -5  -11.90%   x 1.14 
+search::usize::l2            96             67                     -29  -30.21%   x 1.43 
-search::usize::l2_dup        61             77                      16   26.23%   x 0.79 
-search::usize::l3            441            522                     81   18.37%   x 0.84 
-search::usize::l3_dup        241            614                    373  154.77%   x 0.39 

Oh, that's very interesting.. I was running mine on an AMD 2600X, and got very different numbers (see the updated README)... Didn't set lto or codegen-units 1 though. Did you find them to matter a lot in this case (I don't think they should...)?

Seems like it makes a big diff (Not a great example bc sorted may have changed as well.)

$ RUSTFLAGS='-C target-cpu=native' cargo +nightly bench --features nightly this > only_target_cpu
   Compiling ordsearch v0.2.2
    Finished release [optimized] target(s) in 12.13s
     Running target/release/deps/ordsearch-fa7482cf08da0422
$ cargo benchcmp bench.dat only_target_cpu
 name                                  bench.dat ns/iter  only_target_cpu ns/iter  diff ns/iter   diff %  speedup 
-b::this::construction::u32::l1        33,731             48,522                         14,791   43.85%   x 0.70 
-b::this::construction::u32::l1_dup    25,295             42,251                         16,956   67.03%   x 0.60 
-b::this::construction::u32::l2        367,618            588,335                       220,717   60.04%   x 0.62 
-b::this::construction::u32::l2_dup    282,522            421,783                       139,261   49.29%   x 0.67 
-b::this::construction::u8::l1         31,583             46,853                         15,270   48.35%   x 0.67 
-b::this::construction::u8::l1_dup     20,138             30,245                         10,107   50.19%   x 0.67 
-b::this::construction::u8::l2         229,325            365,651                       136,326   59.45%   x 0.63 
-b::this::construction::u8::l2_dup     182,914            270,626                        87,712   47.95%   x 0.68 
-b::this::construction::usize::l1      34,256             42,591                          8,335   24.33%   x 0.80 
-b::this::construction::usize::l1_dup  25,316             32,873                          7,557   29.85%   x 0.77 
-b::this::construction::usize::l2      373,590            461,806                        88,216   23.61%   x 0.81 
-b::this::construction::usize::l2_dup  283,978            410,118                       126,140   44.42%   x 0.69 
-b::this::search::u32::l1              37                 77                                 40  108.11%   x 0.48 
-b::this::search::u32::l1_dup          37                 55                                 18   48.65%   x 0.67 
-b::this::search::u32::l2              64                 100                                36   56.25%   x 0.64 
-b::this::search::u32::l2_dup          64                 87                                 23   35.94%   x 0.74 
-b::this::search::u32::l3              380                527                               147   38.68%   x 0.72 
-b::this::search::u32::l3_dup          381                504                               123   32.28%   x 0.76 
-b::this::search::u8::l1               37                 60                                 23   62.16%   x 0.62 
-b::this::search::u8::l1_dup           37                 46                                  9   24.32%   x 0.80 
-b::this::search::u8::l2               58                 77                                 19   32.76%   x 0.75 
+b::this::search::u8::l2_dup           56                 50                                 -6  -10.71%   x 1.12 
-b::this::search::u8::l3               170                176                                 6    3.53%   x 0.97 
+b::this::search::u8::l3_dup           127                69                                -58  -45.67%   x 1.84 
-b::this::search::usize::l1            37                 77                                 40  108.11%   x 0.48 
-b::this::search::usize::l1_dup        37                 63                                 26   70.27%   x 0.59 
-b::this::search::usize::l2            67                 109                                42   62.69%   x 0.61 
-b::this::search::usize::l2_dup        77                 113                                36   46.75%   x 0.68 
-b::this::search::usize::l3            522                1,053                             531  101.72%   x 0.50 
-b::this::search::usize::l3_dup        614                709                                95   15.47%   x 0.87 

Proper comparison:

$ RUSTFLAGS='-C target-cpu=native' cargo +nightly bench --features nightly  > only_target_cpu
 name                         sorted_only_target ns/iter  this_only_target ns/iter  diff ns/iter   diff %  speedup 
-construction::u32::l1        35,543                      42,240                           6,697   18.84%   x 0.84 
-construction::u32::l1_dup    27,242                      32,395                           5,153   18.92%   x 0.84 
-construction::u32::l2        400,881                     464,190                         63,309   15.79%   x 0.86 
-construction::u32::l2_dup    316,300                     367,845                         51,545   16.30%   x 0.86 
-construction::u8::l1         32,853                      39,299                           6,446   19.62%   x 0.84 
-construction::u8::l1_dup     20,571                      29,485                           8,914   43.33%   x 0.70 
-construction::u8::l2         235,016                     299,221                         64,205   27.32%   x 0.79 
-construction::u8::l2_dup     211,679                     271,658                         59,979   28.33%   x 0.78 
-construction::usize::l1      36,596                      41,660                           5,064   13.84%   x 0.88 
-construction::usize::l1_dup  30,698                      32,598                           1,900    6.19%   x 0.94 
-construction::usize::l2      407,180                     480,711                         73,531   18.06%   x 0.85 
-construction::usize::l2_dup  312,361                     371,245                         58,884   18.85%   x 0.84 
-search::u32::l1              76                          77                                   1    1.32%   x 0.99 
+search::u32::l1_dup          63                          55                                  -8  -12.70%   x 1.15 
-search::u32::l2              97                          100                                  3    3.09%   x 0.97 
+search::u32::l2_dup          104                         87                                 -17  -16.35%   x 1.20 
-search::u32::l3              242                         525                                283  116.94%   x 0.46 
-search::u32::l3_dup          254                         505                                251   98.82%   x 0.50 
+search::u8::l1               70                          60                                 -10  -14.29%   x 1.17 
-search::u8::l1_dup           45                          46                                   1    2.22%   x 0.98 
+search::u8::l2               79                          77                                  -2   -2.53%   x 1.03 
+search::u8::l2_dup           52                          51                                  -1   -1.92%   x 1.02 
-search::u8::l3               81                          155                                 74   91.36%   x 0.52 
-search::u8::l3_dup           59                          70                                  11   18.64%   x 0.84 
-search::usize::l1            74                          77                                   3    4.05%   x 0.96 
+search::usize::l1_dup        62                          55                                  -7  -11.29%   x 1.13 
-search::usize::l2            98                          103                                  5    5.10%   x 0.95 
+search::usize::l2_dup        98                          89                                  -9   -9.18%   x 1.10 
-search::usize::l3            348                         591                                243   69.83%   x 0.59 
-search::usize::l3_dup        344                         635                                291   84.59%   x 0.54 

Taking speedup for each and applying lto_codegen / only_target:

 name                         only_target  lto_codegen    speedup relative to sorted
+construction::u8::l1         0.7          0.87                               x 1.24
+construction::u8::l1_dup     0.6          0.81                               x 1.35
+construction::u8::l2         0.62         0.81                               x 1.31
+construction::u8::l2_dup     0.67         0.79                               x 1.18
+construction::u32::l1        0.67         0.88                               x 1.31
+construction::u32::l1_dup    0.67         0.84                               x 1.25
+construction::u32::l2        0.63         0.89                               x 1.41
+construction::u32::l2_dup    0.68         0.86                               x 1.26
+construction::usize::l1      0.8          0.89                               x 1.11
+construction::usize::l1_dup  0.77         0.85                               x 1.10
+construction::usize::l2      0.81         0.89                               x 1.10
+construction::usize::l2_dup  0.69         0.86                               x 1.25
+search::u8::l1               0.48         1.24                               x 2.58
+search::u8::l1_dup           0.67         0.84                               x 1.25
+search::u8::l2               0.64         0.76                               x 1.19
-search::u8::l2_dup           0.74         0.55                               x 0.74
-search::u8::l3               0.72         0.17                               x 0.24
-search::u8::l3_dup           0.76         0.24                               x 0.32
+search::u32::l1              0.62         1.78                               x 2.87
+search::u32::l1_dup          0.8          1.11                               x 1.39
+search::u32::l2              0.75         1.33                               x 1.77
-search::u32::l2_dup          1.12         0.97                               x 0.87
-search::u32::l3              0.97         0.47                               x 0.48
-search::u32::l3_dup          1.84         0.41                               x 0.22
+search::usize::l1            0.48         1.78                               x 3.71
+search::usize::l1_dup        0.59         1.11                               x 1.88
+search::usize::l2            0.61         1.3                                x 2.13
+search::usize::l2_dup        0.68         0.81                               x 1.19
-search::usize::l3            0.5          0.47                               x 0.94
-search::usize::l3_dup        0.87         0.33                               x 0.38

Seems like lto and codegen-units 1 make a difference

codegen-units=1 alone doesn't seem to make up for it

 name                         sorted_target_codegen ns/iter  this_target_codegen ns/iter  diff ns/iter   diff %  speedup 
+construction::u32::l1        42,023                         41,649                               -374   -0.89%   x 1.01 
-construction::u32::l1_dup    27,217                         31,767                              4,550   16.72%   x 0.86 
-construction::u32::l2        435,104                        457,158                            22,054    5.07%   x 0.95 
-construction::u32::l2_dup    311,043                        371,604                            60,561   19.47%   x 0.84 
-construction::u8::l1         32,772                         38,262                              5,490   16.75%   x 0.86 
-construction::u8::l1_dup     20,170                         25,119                              4,949   24.54%   x 0.80 
-construction::u8::l2         239,203                        293,715                            54,512   22.79%   x 0.81 
-construction::u8::l2_dup     198,701                        255,285                            56,584   28.48%   x 0.78 
-construction::usize::l1      35,766                         41,766                              6,000   16.78%   x 0.86 
-construction::usize::l1_dup  26,466                         32,727                              6,261   23.66%   x 0.81 
-construction::usize::l2      407,139                        537,220                           130,081   31.95%   x 0.76 
-construction::usize::l2_dup  317,323                        377,680                            60,357   19.02%   x 0.84 
-search::u32::l1              73                             77                                      4    5.48%   x 0.95 
+search::u32::l1_dup          61                             56                                     -5   -8.20%   x 1.09 
-search::u32::l2              95                             99                                      4    4.21%   x 0.96 
+search::u32::l2_dup          98                             87                                    -11  -11.22%   x 1.13 
-search::u32::l3              213                            472                                   259  121.60%   x 0.45 
-search::u32::l3_dup          214                            453                                   239  111.68%   x 0.47 
-search::u8::l1               66                             76                                     10   15.15%   x 0.87 
-search::u8::l1_dup           45                             46                                      1    2.22%   x 0.98 
-search::u8::l2               73                             95                                     22   30.14%   x 0.77 
+search::u8::l2_dup           53                             51                                     -2   -3.77%   x 1.04 
-search::u8::l3               61                             168                                   107  175.41%   x 0.36 
-search::u8::l3_dup           59                             73                                     14   23.73%   x 0.81 
-search::usize::l1            75                             94                                     19   25.33%   x 0.80 
+search::usize::l1_dup        61                             58                                     -3   -4.92%   x 1.05 
-search::usize::l2            98                             106                                     8    8.16%   x 0.92 
-search::usize::l2_dup        96                             108                                    12   12.50%   x 0.89 
-search::usize::l3            324                            642                                   318   98.15%   x 0.50 
-search::usize::l3_dup        299                            629                                   330  110.37%   x 0.48 

Indeed, seems like lto=thin is what makes the difference:

 name                         sorted_target_thinlto ns/iter  this_target_thinlto ns/iter  diff ns/iter   diff %  speedup 
-construction::u32::l1        36,769                         57,720                             20,951   56.98%   x 0.64 
-construction::u32::l1_dup    26,887                         40,277                             13,390   49.80%   x 0.67 
-construction::u32::l2        491,083                        562,150                            71,067   14.47%   x 0.87 
-construction::u32::l2_dup    363,382                        574,514                           211,132   58.10%   x 0.63 
-construction::u8::l1         40,745                         42,283                              1,538    3.77%   x 0.96 
-construction::u8::l1_dup     22,958                         29,890                              6,932   30.19%   x 0.77 
-construction::u8::l2         279,075                        351,648                            72,573   26.00%   x 0.79 
-construction::u8::l2_dup     208,916                        302,384                            93,468   44.74%   x 0.69 
+construction::usize::l1      44,409                         43,570                               -839   -1.89%   x 1.02 
-construction::usize::l1_dup  32,624                         34,544                              1,920    5.89%   x 0.94 
-construction::usize::l2      413,165                        694,884                           281,719   68.19%   x 0.59 
-construction::usize::l2_dup  364,902                        468,816                           103,914   28.48%   x 0.78 
-search::u32::l1              86                             103                                    17   19.77%   x 0.83 
+search::u32::l1_dup          73                             66                                     -7   -9.59%   x 1.11 
+search::u32::l2              113                            103                                   -10   -8.85%   x 1.10 
-search::u32::l2_dup          98                             99                                      1    1.02%   x 0.99 
-search::u32::l3              299                            540                                   241   80.60%   x 0.55 
+search::u32::l3_dup          721                            546                                  -175  -24.27%   x 1.32 
+search::u8::l1               87                             62                                    -25  -28.74%   x 1.40 
+search::u8::l1_dup           60                             57                                     -3   -5.00%   x 1.05 
+search::u8::l2               88                             81                                     -7   -7.95%   x 1.09 
+search::u8::l2_dup           55                             55                                      0    0.00%   x 1.00 
-search::u8::l3               67                             196                                   129  192.54%   x 0.34 
-search::u8::l3_dup           62                             80                                     18   29.03%   x 0.78 
+search::usize::l1            90                             82                                     -8   -8.89%   x 1.10 
+search::usize::l1_dup        75                             56                                    -19  -25.33%   x 1.34 
+search::usize::l2            109                            106                                    -3   -2.75%   x 1.03 
+search::usize::l2_dup        103                            90                                    -13  -12.62%   x 1.14 
-search::usize::l3            355                            663                                   308   86.76%   x 0.54 
-search::usize::l3_dup        361                            906                                   545  150.97%   x 0.40