Initialization of the "virtual machine" for multithreaded mode

Question

Initialization of the "virtual machine" for multithreaded mode

KilimSerg opened this issue 2 years ago · comments

I'm having trouble initializing the 'virtual machine' for multi-threaded mode. When working with one thread, there are no problems with the randomx_calculate_hash(...) function, which works fine and gives a speed of approximately 170 h / s, but as soon as I try to organize the calculation in at least 2 threads, an error is generated related to the cache.
I tried to use the rx-slow-hash(…) function, which is developed for the pyrx Python library, where before calling the function, the code is used that is initialized by the “virtual machine”. This function works fine on multiple streams but I can't get more than 140-150 h/s. I tried to understand this code, but as I understand it, it is written in an early version of the C language and there are no comments without which it is not clear under what conditions the “virtual machine” is initialized.
Can you explain in a simpler “beginner” way how to initialize a “virtual machine” to run on multiple processor cores or threads?
Sergey.

Symphonic commented 2 years ago

Yes

Symphonic · Answer 1 · Thu Mar 24 2022 03:42:15 GMT+0800 (China Standard Time)

If you look at the example, you can see how the dataset is initialized and then used to create a vm. This dataset can be used multiple times and you should create a new vm using this same dataset for each thread. When you are done with randomx, you close all vms and then the dataset.

KilimSerg · Answer 2 · Thu Mar 24 2022 05:06:26 GMT+0800 (China Standard Time)

In the example you mentioned, the virtual machine is being initialized to run in “light mode” and I am trying to initialize the virtual machine for “quick mode”.
For one thread, there are no problems, but when you try to run at least two threads, it gives an error related to the cache.
I can't figure out how to allocate the amount of cache to multiple threads for a virtual machine.
I have not yet found information about what the virtual machine initialization functions are, what they serve for.
How to use them in multi-threaded mode?
There is very little information about how which flags should be used and under what conditions.
The flags themselves have a numerical expression and these numerical expressions can somehow change or is it permanent?
I would be very grateful if you could explain these points or give a link to the information if it exists somewhere.

Symphonic · Answer 3 · Thu Mar 24 2022 07:54:12 GMT+0800 (China Standard Time)

The flags are defined in randomx.h but you assign them when you allocate the dataset. Notice on lines 11-13 of the example a variable is being used to set the flags. This is just a standard use of bitwise flags. The mode that is displayed in api-example2.cpp is the "quick mode" which uses 2GB of RAM. I believe initialization needs to be done differently if you want the slower, lighter version, but I believe that is in fact not what you want to do.

For my use case I created a Java Native Access bridge for all the methods in randomx.h so unfortunately I don't really have example C code to show you.

Essentially, you follow the steps in api-example2.cpp up to line 33 and then you need to manage virtual machines for each thread. When you spawn a thread to do work, call randomx_create_vm(flags, nullptr, dataset) using the same dataset for every thread. You only need one dataset that is shared for every thread. When the thread is finished, call randomx_destroy_vm on that vm.

If you want to see the java code I wrote, you can take a look here.

Symphonic · Answer 4 · Thu Mar 24 2022 08:02:03 GMT+0800 (China Standard Time)

In terms of how to use the flags:

RANDOMX_FLAG_LARGE_PAGES allocates the memory in large pages which makes dataset allocation waaaay faster. There is usually no reason to not enable this.
RANDOMX_FLAG_HARD_AES uses the built-in AES capabilities of your device. You can usually leave this as default.
RANDOMX_FLAG_FULL_MEM is to switch from "quick" to "light" mode, but as i said there are different initialization steps for this.
RANDOMX_FLAG_JIT uses Just-In-Time compilation instead of interpreted commands which makes it waaay faster. This should only be disabled for systems that don't support it.
RANDOMX_FLAG_SECURE involves memory security. I haven't touched it at all but it has something to do with disallowing write operations, and is only enabled by default on certain operating systems.
RANDOMX_FLAG_ARGON2_AVX2 and RANDOMX_FLAG_ARGON2_SSSE3 just make Argon2 dataset initialization faster and you don't usually need to change these from the default.

KilimSerg · Answer 5 · Thu Mar 24 2022 16:41:42 GMT+0800 (China Standard Time)

If I understand you correctly, I'm sorry if I repeat because I don't know English and I have to use Google translator, then the main initialization of the virtual machine (an example of api-example2.cpp lines from 11 to 33) should be done in the main thread 1 time, and already adding the virtual machines (randomx_create_vm(flags, nullptr, myDataset))
and hash calculation (randomx_calculate_hash(myMachine, &myInput, sizeof myInput, hash))
will be performed separately in each thread, depending on the number of processor cores.
Did I understand you correctly?

KilimSerg · Answer 6 · Fri Mar 25 2022 04:37:04 GMT+0800 (China Standard Time)

Thank you very much for your explanation.
Now I understand why I had problems.
I tried to initialize the VM in every thread.
And in order to fully initialize the virtual machine in each thread, no processor resource is enough.
It is a pity that there is no such explanation of the correct sequence of initialization of the virtual machine in the description of RandomX, or at least I could not find it.
Thanks again for your help.

KilimSerg · Answer 7 · Fri Apr 01 2022 00:03:49 GMT+0800 (China Standard Time)

Using your explanations, I redid the program code and everything worked out for me,
I was able to run as many threads without errors as I have processor cores.
Thanks again.
However, I have another question about adding and destroying a virtual machine on a specific thread.
At what point you need to add and at what point it will destroy the virtual machine.

1 - the virtual machine option is added at the time of stream creation and destroyed before the stream is closed.

2 - the virtual machine option is added after receiving the task from the pool or blockchain and destroyed after the task is completed.

3 - The third option virtual machine is added before each hash is calculated and destroyed after each hash is calculated.

Which of these options is correct?
If none, write how to add and destroy a virtual machine correctly
in a specific thread.
At the moment I'm using 1 - option.

hyc · Answer 8 · Fri Apr 01 2022 00:17:12 GMT+0800 (China Standard Time)

option 3 is just plain stupid, it would be a tremendous waste of CPU work.

The rest are not much better. Never destroy something you are going to create again a short time later.