Issues with transactions in `sentinel_2pc::controller`
mszulcz-mitre opened this issue · comments
Affected Branch
trunk
Basic Diagnostics
-
I've pulled the latest changes on the affected branch and the issue is still present.
-
The issue is reproducible in docker
Description
Problem: sentinel_2pc::controller::execute_transaction
can never fail.
Its description says it returns “false if the sentinel was unable to forward the transaction to a coordinator” (lines 52-53 in src/uhs/twophase/sentinel_2pc/controller.hpp). However, it can only return true. There are no return false
statements in its body.
Solution
The failure to forward a transaction to a coordinator actually occurs in sentinel_2pc::controller::send_compact_tx
, which is in the call stack of execute_transaction
:
execute_transaction
calls gather_attestations
calls send_compact_tx
.
In order for execute_transaction
to register the failure, gather_attestations
and send_compact_tx
would both need to return a bool that signals success or failure. Currently, neither has a return value and neither detects success or failure. Rather, send_compact_tx
continually tries to forward the transaction in an infinite loop. The author of the method was apparently aware of this (lines 194-203 in src/uhs/twophase/sentinel_2pc/controller.cpp):
// TODO: add a "retry" error response to offload sentinels from this
// infinite retry responsibility.
while(!m_coordinator_client.execute_transaction(ctx, cb)) {
// TODO: the network currently doesn't provide a callback for
// reconnection events so we have to sleep here to
// prevent a needless spin. Instead, add such a callback
// or queue to the network to remove this sleep.
static constexpr auto retry_delay = std::chrono::milliseconds(100);
std::this_thread::sleep_for(retry_delay);
};
It seems the original author had some ideas to fix this, but I don’t fully understand them. One change that seems reasonable is to call send_compact_tx
directly from execute_transaction
rather than from gather_attestations
. This simplifies the call stack and seems reasonable because send_compact_tx
appears unrelated to the purpose of gather_attestations
. As for the infinite loop, it seems we’ll have to decide how many times the loop should execute or for how long it should execute before determining that the transaction failed.
Problem: executing transactions without required attestations
It appears as though a transaction can be executed without gathering the required attestations. To see this, consider the unit test below added to tests/unit/sentinel_2pc/controller_test.cpp. The test sets up a controller that requires an attestation from a sentinel client to execute a transaction. Instead of instantiating a sentinel server for the client, though, the test instantiates a fake sentinel server that’s only a TCP listener and therefore shouldn’t be able to provide an attestation. Nonetheless, execute_transaction
returns true. Of course, the problem description above points out that it’s guaranteed to return true, but even once that problem is fixed, it would still return true because the root problem is different. Here’s the test:
TEST_F(sentinel_2pc_test, execute_transaction_without_required_attestations) {
// This test attempts to show that a transaction can be executed without
// gathering the required 2 attestations.
m_opts.m_attestation_threshold = 2;
// One attestation will come from the controller launched below (ctl)
// in its call to controller::execute_transaction (see lines 94-97 in
// src/uhs/twophase/sentinel_2pc/controller.cpp). The 2nd attestation
// should come from a 2nd sentinel server. In this demo, though, the 2nd
// sentinel server is just a TCP listener that can't provide attestations.
constexpr unsigned short sentinel_port = 32003;
const auto sentinel_ep
= std::make_pair(cbdc::network::localhost, sentinel_port);
m_opts.m_sentinel_endpoints.push_back(sentinel_ep);
auto fake_sentinel_rpc_server = cbdc::network::tcp_listener();
ASSERT_TRUE(fake_sentinel_rpc_server.listen(cbdc::network::localhost,
sentinel_port));
// Make and initialize a sentinel controller.
auto ctl = std::make_unique<cbdc::sentinel_2pc::controller>(0,
m_opts,
m_logger);
ASSERT_TRUE(ctl->init());
// Execute a transaction. This call succeeds despite the fact that
// it's impossible to obtain the 2nd attestation from the fake sentinel.
auto res = ctl->execute_transaction(m_valid_tx, [](auto /* param */) {});
ASSERT_TRUE(res);
}
Attestations are gathered in the method gather_attestations
, which is called by execute_transaction
. Stepping through gather_attestations
in a debugger shows that, as expected, the number of attestations does not increase to the required number of 2 before the method returns. The output of the test to the terminal confirms this, as it doesn’t show the debug message “Accepted” that indicates sufficient attestations have been gathered for the compact transaction to be accepted. This output is triggered on Line 181 in controller.cpp in the body of gather_attestations
:
m_logger->debug("Accepted", to_string(ctx.m_id));
Here’s the output from the test above:
[ RUN ] sentinel_2pc_test.execute_transaction_without_required_attestations
[2022-07-07 00:31:22.708] [INFO ] Sentinel public key: eaa649f21f51bdbae7be4ae34ce6e5217a58fdce7f47f9aa7f3b58fa2120e2b3
[ OK ] sentinel_2pc_test.execute_transaction_without_required_attestations (6094 ms)
For comparison, here’s the output from another test in which all the attestations are successfully gathered and the debug message appears:
[ RUN ] sentinel_2pc_test.digest_valid_transaction_direct
[2022-07-07 00:31:10.327] [INFO ] Sentinel public key: eaa649f21f51bdbae7be4ae34ce6e5217a58fdce7f47f9aa7f3b58fa2120e2b3
[2022-07-07 00:31:10.329] [DEBUG] Accepted 3ffcc8bc7153d34aaad2adbcddd64c063ff005543c85ae3ae1f88e91c3b526eb
[ OK ] sentinel_2pc_test.digest_valid_transaction_direct (5 ms)
Solution
I’m not sure what the full solution is yet, but it seems reasonable to at least allow gather_attestations
to return true or false depending on whether the required number of attestations is actually obtained.
Code of Conduct
- I agree to follow this project's Code of Conduct
Here's a bit of background on gather_attestations()
: #87 (comment)
In short, gather_attestations()
returns (probably) before the attestations have actually been gathered (because it executes a set of asynchronous callbacks).
This is not my favorite construction, but the obvious alternative (having gather_attestations()
block) is likely terrible for performance.