learnedsystems / SOSD

A Benchmark for Learned Indexes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmark's memory requirement

alihadian opened this issue · comments

On a system with 16GB data, the benchmark crashes while building the benchmark:

[100%] Built target benchmark
Generating lookups for osm_cellids_200M_uint64
Generating lookups for osm_cellids_400M_uint64
Generating lookups for osm_cellids_600M_uint64
read 600000000 values from ../data/osm_cellids_600M_uint64 in 2095 ms (286.396 M values/s)
terminate called after throwing an instance of 'std::bad_alloc'

To upgrade the memory, I wonder what is the peak memory usage of the benchmark. Do you have any rough estimate?

Apparently a bit more than 16GB is enough for the build of osm_cellids_600M_uint64, but then perhaps the benchmark execution could take more memory. right?

Yeah, I think the larger datasets simply require more than 16GB to generate. You have two options.

  1. You can just skip the larger datasets and only run the smaller (200M keys) dataset. For this, just remove the lines from the prepare.sh script.
  2. Allow swapping 32GB in total should be sufficient. However, this might blow up the time it takes to generate...

Hope this helps :)

Thanks for your suggestions, @alexandervanrenen
Unfortunately, the benchmark doesn't even run on 200M-key datasets. It's surprising as the previous version ( https://github.com/learnedsystems/SOSD/tree/mlforsys19 ) could easily run datasets of this size on a system with 16GB of memory (~15GB free memory)

Here is the sample output when I comment out the 400M, 600M, and 800M datasets and only try to run the 200M-record ones (GCC 10):

Executing benchmark and saving results...
Executing workload osm_cellids_200M_uint64
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/osm_cellids_200M_uint64 in 3616 ms (55.3097 M values/s)
data is unique
read 10000000 values from ./data/osm_cellids_200M_uint64_equality_lookups_10M in 388 ms (25.7732 M values/s)
RESULT: RMI,0,228.528,402653216,0,BinarySearch
RESULT: RMI,1,241.258,201326624,0,BinarySearch
RESULT: RMI,2,263.921,100663328,0,BinarySearch
RESULT: RMI,3,313.02,41943040,0,BinarySearch
RESULT: RMI,4,341.368,12582944,0,BinarySearch
RESULT: RMI,5,373.992,6291488,0,BinarySearch
RESULT: RMI,6,407.533,1835008,0,BinarySearch
RESULT: RMI,7,530.899,786448,0,BinarySearch
RESULT: RMI,8,805.79,24592,0,BinarySearch
RESULT: RMI,9,978.256,3088,0,BinarySearch
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Executing workload wiki_ts_200M_uint64
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/wiki_ts_200M_uint64 in 3967 ms (50.4159 M values/s)
data contains duplicates
read 10000000 values from ./data/wiki_ts_200M_uint64_equality_lookups_10M in 356 ms (28.0899 M values/s)
RESULT: RMI,0,153.876,402653216,0,BinarySearch
RESULT: RMI,1,156.456,201326624,0,BinarySearch
RESULT: RMI,2,157.162,100663328,0,BinarySearch
RESULT: RMI,3,168.869,25165856,0,BinarySearch
RESULT: RMI,4,172.909,12582944,0,BinarySearch
RESULT: RMI,5,177.13,6291488,0,BinarySearch
RESULT: RMI,6,179.807,3145760,0,BinarySearch
RESULT: RMI,7,219.125,786464,0,BinarySearch
RESULT: RMI,8,473.818,24608,0,BinarySearch
RESULT: RMI,9,616.995,3088,0,BinarySearch
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Executing workload books_200M_uint64
...

In the previous version, the key-value pairs for 200M records took 3.2 GB (64-bit key & payload), and another 3.2 GB when building any index (copying data to data_), plus the index size. It's surprising why the benchmark can't handle datasets of that size anymore.

Ok for this one I am not sure what is happening (have not seen it before). Might be the case that RMI is not freeing some memory or RS is allocating too much memory .. has any of you seen this one @RyanMarcus @andreaskipf ?

Not sure what changed w.r.t. the memory requirements (I have no issue running the benchmark on a machine with 32GiB RAM). However, we will replace the current RS implementation soon with the new one, which operates directly on the input array without creating a copy. I'll ping this thread once this is done.

Ok, I will investigate ... should be easy enough to figure out :)

We have just replaced RS with the new version. @alihadian can you please verify whether you can now build it on your 16GiB machine? Thanks!

We have just replaced RS with the new version. @alihadian can you please verify whether you can now build it on your 16GiB machine? Thanks!

[Wrong output was posted in my previous comment]

Thanks. If you want to make the benchmark hands-off on 16GB, then prepare.sh must take into account the selected datasets defined in datasets_under_test.txt. The script currently tries to load all datasets and generate queries for all datasets (including the 400M & 600M-record ones), and hence crashes.

I manually commented out the 400M+ from prepare & datasets_under_test, but still some algorithms crash:

Executing workload osm_cellids_200M_uint64
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/osm_cellids_200M_uint64 in 15702 ms (12.7372 M values/s)
data is unique
read 10000000 values from ./data/osm_cellids_200M_uint64_equality_lookups_10M in 2070 ms (4.83092 M values/s)
RESULT: RMI,0,1010.59,402653216,0,BinarySearch
RESULT: RMI,1,1074.33,201326624,0,BinarySearch
RESULT: RMI,2,1095.28,100663328,0,BinarySearch
RESULT: RMI,3,1204.58,41943040,0,BinarySearch
RESULT: RMI,4,1360.01,12582944,0,BinarySearch
RESULT: RMI,5,1403.15,6291488,0,BinarySearch
RESULT: RMI,6,1793.47,1835008,0,BinarySearch
RESULT: RMI,7,2218.07,786448,0,BinarySearch
RESULT: RMI,8,3657.79,24592,0,BinarySearch
RESULT: RMI,9,5234.25,3088,0,BinarySearch
RESULT: RS,1,641.705,507999944,8002877965,BinarySearch
RESULT: RS,2,853.138,251011320,4622782514,BinarySearch
RESULT: RS,3,926.852,127129548,3979255691,BinarySearch
RESULT: RS,4,1026.19,63095928,3763571541,BinarySearch
RESULT: RS,5,1154.16,31774880,3650606262,BinarySearch
RESULT: RS,6,1110.11,15948212,3633930730,BinarySearch
RESULT: RS,7,1106.99,7995228,3461003894,BinarySearch
RESULT: RS,8,1069.86,3994788,3576537020,BinarySearch
RESULT: RS,9,1166.48,1998144,3562924055,BinarySearch
RESULT: RS,10,1207.08,1998144,3363430008,BinarySearch
RESULT: PGM,16,1052.47,27793600,15803107461,BinarySearch
RESULT: PGM,4,1119.51,118835160,18404535380,BinarySearch
RESULT: PGM,8,980.764,57372800,16146689882,BinarySearch
RESULT: PGM,32,1032.2,13610680,14305571668,BinarySearch
RESULT: PGM,64,1161.61,6735780,13661276135,BinarySearch
RESULT: PGM,256,1262.36,1681180,13233111712,BinarySearch
RESULT: PGM,1024,1257.17,432980,12709408050,BinarySearch
RESULT: PGM,2048,1499.83,220880,12661867611,BinarySearch
RESULT: PGM,4096,1686.77,114060,12548411211,BinarySearch
RESULT: PGM,8192,1759.52,59220,12633139100,BinarySearch
Executing workload wiki_ts_200M_uint64
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/wiki_ts_200M_uint64 in 10557 ms (18.9448 M values/s)
data contains duplicates
read 10000000 values from ./data/wiki_ts_200M_uint64_equality_lookups_10M in 528 ms (18.9394 M values/s)
RESULT: RMI,0,657.774,402653216,0,BinarySearch
RESULT: RMI,1,744.178,201326624,0,BinarySearch
RESULT: RMI,2,696.591,100663328,0,BinarySearch
RESULT: RMI,3,640.246,25165856,0,BinarySearch
RESULT: RMI,4,686.239,12582944,0,BinarySearch
RESULT: RMI,5,706.922,6291488,0,BinarySearch
RESULT: RMI,6,750.191,3145760,0,BinarySearch
RESULT: RMI,7,942.32,786464,0,BinarySearch
RESULT: RMI,8,2004.11,24608,0,BinarySearch
RESULT: RMI,9,2584.12,3088,0,BinarySearch
RESULT: RS,1,851.499,506867068,3673072594,BinarySearch
RESULT: RS,2,920.771,247625288,2619266961,BinarySearch
RESULT: RS,3,885.471,125076056,2422303161,BinarySearch
RESULT: RS,4,880.775,63877208,2333537346,BinarySearch
RESULT: RS,5,908.698,31917576,2316853214,BinarySearch
RESULT: RS,6,942.354,15909560,2169338252,BinarySearch
RESULT: RS,7,962.092,7976336,2320432160,BinarySearch
RESULT: RS,8,1075.88,3999204,2117304207,BinarySearch
RESULT: RS,9,850.06,1999416,2216469260,BinarySearch
RESULT: RS,10,940.018,1000212,2141017992,BinarySearch
RESULT: PGM,16,768.075,4604720,6691198227,BinarySearch
RESULT: PGM,4,826.544,45559380,8861879257,BinarySearch
RESULT: PGM,8,807.3,14269500,7485016008,BinarySearch
RESULT: PGM,32,819.202,1722040,6132314744,BinarySearch
RESULT: PGM,64,812.763,754920,5947471202,BinarySearch
RESULT: PGM,256,1005.45,222760,6259760928,BinarySearch
RESULT: PGM,1024,1224.93,89780,6813015896,BinarySearch
RESULT: PGM,2048,1487.19,54580,7067813374,BinarySearch
RESULT: PGM,4096,1701.52,38880,7487636636,BinarySearch
RESULT: PGM,8192,1872.6,20140,7084727705,BinarySearch
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
index ART is not applicable
RESULT: BTree,32,1187.93,116016232,87378266,BinarySearch
Executing workload books_200M_uint64
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/books_200M_uint64 in 9139 ms (21.8842 M values/s)
data is unique
read 10000000 values from ./data/books_200M_uint64_equality_lookups_10M in 507 ms (19.7239 M values/s)
RESULT: RMI,0,538.599,402653216,0,BinarySearch
RESULT: RMI,1,728.999,201326608,0,BinarySearch
RESULT: RMI,2,711.099,100663312,0,BinarySearch
RESULT: RMI,3,645.941,41943040,0,BinarySearch
RESULT: RMI,4,745.704,12582928,0,BinarySearch
RESULT: RMI,5,797.9,6291472,0,BinarySearch
RESULT: RMI,6,770.18,786464,0,BinarySearch
RESULT: RMI,7,1016.37,24608,0,BinarySearch
RESULT: RMI,8,1196.07,6160,0,BinarySearch
RESULT: RMI,9,1391.49,3088,0,BinarySearch
RESULT: RS,1,627.79,505165832,5832526494,BinarySearch
RESULT: RS,2,655.512,253054936,4165524366,BinarySearch
RESULT: RS,3,695.846,125721624,3874965837,BinarySearch
RESULT: RS,4,861.732,56216360,3365617111,BinarySearch
RESULT: RS,5,807.659,31050536,3388414667,BinarySearch
RESULT: RS,6,858.398,14957704,3119480145,BinarySearch
RESULT: RS,7,810.165,7976712,3171154769,BinarySearch
RESULT: RS,8,950.973,3965432,3050638318,BinarySearch
RESULT: RS,9,836.549,1999352,3003286558,BinarySearch
RESULT: RS,10,858.645,1000888,2962058313,BinarySearch
RESULT: PGM,16,816.247,15418080,11628441556,BinarySearch
RESULT: PGM,4,939.067,142554300,18119772055,BinarySearch
RESULT: PGM,8,796.124,45176040,13506001539,BinarySearch
RESULT: PGM,32,773.995,5268520,10054656238,BinarySearch
RESULT: PGM,64,821.887,1615360,8962632296,BinarySearch
RESULT: PGM,256,1012.33,119800,7457546186,BinarySearch
RESULT: PGM,1024,1145.81,9300,6938414580,BinarySearch
RESULT: PGM,2048,1268.6,4160,6848001357,BinarySearch
RESULT: PGM,4096,1378.11,2300,7004965760,BinarySearch
RESULT: PGM,8192,1606.61,1480,7058185953,BinarySearch
Executing workload fb_200M_uint64
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/fb_200M_uint64 in 21916 ms (9.12575 M values/s)
data is unique
read 10000000 values from ./data/fb_200M_uint64_equality_lookups_10M in 1758 ms (5.68828 M values/s)
RESULT: RMI,0,891.706,402653200,0,BinarySearch
RESULT: RMI,1,865.73,201326608,0,BinarySearch
RESULT: RMI,2,933.047,100663312,0,BinarySearch
RESULT: RMI,3,901.982,25165840,0,BinarySearch
RESULT: RMI,4,1068,12582928,0,BinarySearch
RESULT: RMI,5,1023.22,6291472,0,BinarySearch
RESULT: RMI,6,1075.06,3145744,0,BinarySearch
RESULT: RMI,7,1134.2,786448,0,BinarySearch
RESULT: RMI,8,1505.35,24592,0,BinarySearch
RESULT: RMI,9,1938.53,3088,0,BinarySearch
RESULT: RS,1,2242.48,508497300,5382345445,BinarySearch
RESULT: RS,2,1612.51,255289316,3997186516,BinarySearch
RESULT: RS,3,1697.66,125577380,4065322793,BinarySearch
RESULT: RS,4,1400.74,63861140,3492300676,BinarySearch
RESULT: RS,5,1425.41,31846612,3720742681,BinarySearch
RESULT: RS,6,1272.41,15994836,3371243135,BinarySearch
RESULT: RS,7,1255.25,7982532,3372459774,BinarySearch
RESULT: RS,8,1422.1,3995812,3944594772,BinarySearch
RESULT: RS,9,1285.06,3995812,3744742735,BinarySearch
RESULT: RS,10,1347.09,3995812,3323231016,BinarySearch
RESULT: PGM,16,983.225,43774520,17042005299,BinarySearch
RESULT: PGM,4,1124.01,173186120,20824760678,BinarySearch
RESULT: PGM,8,1154.59,87810720,17319094311,BinarySearch
RESULT: PGM,32,1006.56,21637120,15224770298,BinarySearch
RESULT: PGM,64,1049.02,10639900,14408312276,BinarySearch
RESULT: PGM,256,1084.49,2413640,13264859290,BinarySearch
RESULT: PGM,1024,1238.33,371400,11098957265,BinarySearch
RESULT: PGM,2048,1341.34,118520,9836834009,BinarySearch
RESULT: PGM,4096,1529.15,34160,8846405597,BinarySearch
RESULT: PGM,8192,1671.03,9660,8056823886,BinarySearch
Executing workload books_200M_uint32
Repeating lookup code 1 time(s).
Using 1 thread(s).
read 200000000 values from ./data/books_200M_uint32 in 9247 ms (21.6286 M values/s)
data contains duplicates
read 10000000 values from ./data/books_200M_uint32_equality_lookups_10M in 7359 ms (1.35888 M values/s)
RESULT: RMI,0,579.371,402653216,0,BinarySearch
RESULT: RMI,1,625.251,201326624,0,BinarySearch
RESULT: RMI,2,729.407,100663328,0,BinarySearch
RESULT: RMI,3,607.383,41943040,0,BinarySearch
RESULT: RMI,4,713,12582944,0,BinarySearch
RESULT: RMI,5,796.667,6291488,0,BinarySearch
RESULT: RMI,6,833.428,3145760,0,BinarySearch
RESULT: RMI,7,901.001,786464,0,BinarySearch
RESULT: RMI,8,1212.67,24608,0,BinarySearch
RESULT: RMI,9,1447.71,3088,0,BinarySearch
RESULT: RS,1,579.732,493374828,5437196574,BinarySearch
RESULT: RS,2,547.162,241716588,4266883872,BinarySearch
RESULT: RS,3,834.094,127805212,3885179565,BinarySearch
RESULT: RS,4,776.516,63887116,3886596694,BinarySearch
RESULT: RS,5,834.514,31993964,3679867951,BinarySearch
RESULT: RS,6,822.483,15916236,3236048582,BinarySearch
RESULT: RS,7,772.506,7907100,3250247235,BinarySearch
RESULT: RS,8,825.804,3957404,3045642203,BinarySearch
RESULT: RS,9,881.573,1991324,3021639997,BinarySearch
RESULT: RS,10,950.948,999564,2988914402,BinarySearch
RESULT: PGM,16,782.688,11703200,11996685299,BinarySearch
RESULT: PGM,4,818.124,70640688,15973491758,BinarySearch
RESULT: PGM,8,779.107,29643568,13155445944,BinarySearch
RESULT: PGM,32,748.399,4119136,10104298748,BinarySearch
RESULT: PGM,64,826.829,1293168,8914442329,BinarySearch
RESULT: PGM,256,954.821,97760,7234706040,BinarySearch
RESULT: PGM,1024,1184.74,7552,6619491073,BinarySearch
RESULT: PGM,2048,1295.49,3312,6569362634,BinarySearch
RESULT: PGM,4096,1461.65,1904,6614974335,BinarySearch
RESULT: PGM,8192,1567.02,1152,6722357169,BinarySearch
RESULT: BTree,32,1121.36,87075880,41689096,BinarySearch
$```