romutrio is not faster than wyrand
wangyi-fudan opened this issue · comments
I tested romutrio with https://github.com/lemire/testingRNG
It seems that wyrand is still fastest without AVX
We repeat the benchmark more than once. Make sure that you get comparable results.
Generating 65536 bytes of random numbers
Time reported in number of cycles per byte.
We store values to an array of size = 64 kB.
We just generate the random numbers:
xorshift_k4: 1.05 cycles per byte
xorshift_k5: 1.14 cycles per byte
mersennetwister: 1.78 cycles per byte
mitchellmoore: 1.99 cycles per byte
widynski: 1.14 cycles per byte
xorshift32: 1.34 cycles per byte
pcg32: 1.03 cycles per byte
rand: 3.39 cycles per byte
aesdragontamer: 0.44 cycles per byte
aesctr: 0.50 cycles per byte
lehmer64: 0.50 cycles per byte
xorshift128plus: 0.51 cycles per byte
xoroshiro128plus: 0.47 cycles per byte
splitmix64: 0.50 cycles per byte
pcg64: 0.70 cycles per byte
xorshift1024star: 0.93 cycles per byte
xorshift1024plus: 0.60 cycles per byte
romutrio: 0.57 cycles per byte
wyrand: 0.43 cycles per byte
Now, let;s back to the simplest benchmark code:
#include <sys/time.h>
#include
using namespace std;
uint64_t seed=0;
inline uint64_t wyrand(void){
seed+=0xa0761d6478bd642full;
__uint128_t t=(__uint128_t)(seed^0xe7037ed1a0b428dbull)*seed;
return (t>>64)^t;
}
#define ROTL(d,lrot) ((d<<(lrot)) | (d>>(8*sizeof(d)-(lrot))))
uint64_t xState, yState, zState; // set to nonzero seed
uint64_t romuTrio_random () {
uint64_t xp = xState, yp = yState, zp = zState;
xState = 15241094284759029579u * zp;
yState = yp - xp; yState = ROTL(yState,12);
zState = zp - yp; zState = ROTL(zState,44);
return xp;
}
int main(void){
timeval beg, end; uint64_t ret=0, rep=0x10000000;
gettimeofday(&beg,NULL);
for(size_t r=0; r<rep; r++) ret+=wyrand();
gettimeofday(&end,NULL);
cerr<<"wyrand\t"<<1e-9*rep/(end.tv_sec-beg.tv_sec+1e-6*(end.tv_usec-beg.tv_usec))<<'\n';;
gettimeofday(&beg,NULL);
for(size_t r=0; r<rep; r++) ret+=romuTrio_random();
gettimeofday(&end,NULL);
cerr<<"romutrio\t"<<1e-9*rep/(end.tv_sec-beg.tv_sec+1e-6*(end.tv_usec-beg.tv_usec))<<'\n';;
return ret;
}
the result shows wyrand is faster than romutrio:
wyrand 1.31389
romutrio 1.12729
Hello WangYi!
And thanks for wyrand, it is in my top favorite designs!
It is certainly possible that different benchmarks yield different results.
In particular, the benchmark used here relies on functions that fill buffers, while Lemire’s benchmark relies on functions that return uint32_t or uint64_t.
I made the benchmark easy to execute for everyone on the same type of machine to ensure that everyone would get the same results when running it:
Lines 141 to 151 in 4bd8c5c
The results seem fairly consistent; the digits published in the readme were identical across runs on different days and hours.
But I’d like to investigate wyrand’s performance. If you have insights, they are welcome!
In particular, you mention wyrand is still fastest without AVX – do you think the order could be reversed if removing AVX? Do you mean at the compiler level, or at the CPU level?
In which case, I could add a note on the readme.