DLTcollab / dcurl

Hardware-accelerated Multi-threaded IOTA PoW, drop-in replacement for ccurl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement local fallback PoW execution when RabbitMQ is not activated

marktwtn opened this issue · comments

dcurl would execute the local fallback PoW after the remote worker does not return the PoW result for 10 seconds.
However, when the RabbitMQ is somehow not activated, no one is in charge of calculating the PoW and it will never be done.

We should Implement a local fallback PoW execution mechanism when RabbitMQ is not activated.

We should focus on the initialization part of dcurl first.

In dcurl_init() API, if the RabbitMQ is not activated, the initialization would fail.
Hence dcurl_entry() API would detect the failure and do nothing but return a null pointer as the PoW result.

Although we should make sure the RabbitMQ is working before using dcurl,
the initialization steps should be modified to handle the unexpected scenario as well.

In other cases like the RabbitMQ is closed after the dcurl is initialized successfully would not be a problem at all.
The fallback mechanism has already existed.

Hence we just need to implement the fallback mechanism when the dcurl initialization fails.

Once the issue is resolved, the documentation shall be updated accordingly.

dcurl_init() would return false when one of the assigned instance initialization failed, which is reasonable.
But based on this design, it would not be able to have a local fallback PoW execution when the initialization failed.
Since IRI and dcurl_entry() would check the return value of dcurl_init() and return immediately if the value is false.

Should we change the design to return true if one of the assigned instance like remote interface, CPU or GPU initialized successfully?

The new design is if one of the assigned instance like remote interface, CPU or GPU is initialized successfully, the dcurl_init() would return true.

And dcurl_entry() would call the dcurl_init() if one of the assigned instance is not initialized successfully.
However, it might have data race condition when calling multiple dcurl_entry() at the same time.

I might have to conquer it or figure out a better mechanism.

There are 2 possible design:

  • dcurl_entry() would try to recover the status of the initialization.
    If one of the instance fails, it would call the dcurl_init() every time to reinitialize.
    However, we need a lock to make sure that multiple dcurl_entry() would not call the dcurl_init() at the same time to avoid data race.
    And on the machine without RabbitMQ, dcurl would keep reinitializing until the RabbitMQ is activated.

  • dcurl_entry() would not recover the status of the initialization.
    However, it would use the available instance to do the PoW if other instances fail.
    For example, if RabbitMQ is not activated, the remote interface would not initialize successfully.
    And it would use CPU instead.
    If we want to use RabbitMQ again, we need to activate it first and restart IRI + dcurl.

I prefer the second design since it has the same flow as original.