mit-dci / opencbdc-tx

A transaction processor for a hypothetical, general-purpose, central bank digital currency

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues with transactions in `sentinel_2pc::controller`

mszulcz-mitre opened this issue · comments

Affected Branch

trunk

Basic Diagnostics

  • I've pulled the latest changes on the affected branch and the issue is still present.

  • The issue is reproducible in docker

Description

Problem: sentinel_2pc::controller::execute_transaction can never fail.

Its description says it returns “false if the sentinel was unable to forward the transaction to a coordinator” (lines 52-53 in src/uhs/twophase/sentinel_2pc/controller.hpp). However, it can only return true. There are no return false statements in its body.

Solution

The failure to forward a transaction to a coordinator actually occurs in sentinel_2pc::controller::send_compact_tx, which is in the call stack of execute_transaction:

execute_transaction calls gather_attestations calls send_compact_tx.

In order for execute_transaction to register the failure, gather_attestations and send_compact_tx would both need to return a bool that signals success or failure. Currently, neither has a return value and neither detects success or failure. Rather, send_compact_tx continually tries to forward the transaction in an infinite loop. The author of the method was apparently aware of this (lines 194-203 in src/uhs/twophase/sentinel_2pc/controller.cpp):

        // TODO: add a "retry" error response to offload sentinels from this
        //       infinite retry responsibility.
        while(!m_coordinator_client.execute_transaction(ctx, cb)) {
            // TODO: the network currently doesn't provide a callback for
            //       reconnection events so we have to sleep here to
            //       prevent a needless spin. Instead, add such a callback
            //       or queue to the network to remove this sleep.
            static constexpr auto retry_delay = std::chrono::milliseconds(100);
            std::this_thread::sleep_for(retry_delay);
        };

It seems the original author had some ideas to fix this, but I don’t fully understand them. One change that seems reasonable is to call send_compact_tx directly from execute_transaction rather than from gather_attestations. This simplifies the call stack and seems reasonable because send_compact_tx appears unrelated to the purpose of gather_attestations. As for the infinite loop, it seems we’ll have to decide how many times the loop should execute or for how long it should execute before determining that the transaction failed.

Problem: executing transactions without required attestations

It appears as though a transaction can be executed without gathering the required attestations. To see this, consider the unit test below added to tests/unit/sentinel_2pc/controller_test.cpp. The test sets up a controller that requires an attestation from a sentinel client to execute a transaction. Instead of instantiating a sentinel server for the client, though, the test instantiates a fake sentinel server that’s only a TCP listener and therefore shouldn’t be able to provide an attestation. Nonetheless, execute_transaction returns true. Of course, the problem description above points out that it’s guaranteed to return true, but even once that problem is fixed, it would still return true because the root problem is different. Here’s the test:

TEST_F(sentinel_2pc_test, execute_transaction_without_required_attestations) {
    // This test attempts to show that a transaction can be executed without
    // gathering the required 2 attestations.  
    m_opts.m_attestation_threshold = 2;

    // One attestation will come from the controller launched below (ctl)
    // in its call to controller::execute_transaction (see lines 94-97 in
    // src/uhs/twophase/sentinel_2pc/controller.cpp).  The 2nd attestation
    // should come from a 2nd sentinel server.  In this demo, though, the 2nd
    // sentinel server is just a TCP listener that can't provide attestations.
    constexpr unsigned short sentinel_port = 32003;
    const auto sentinel_ep
        = std::make_pair(cbdc::network::localhost, sentinel_port);
    m_opts.m_sentinel_endpoints.push_back(sentinel_ep);
    auto fake_sentinel_rpc_server = cbdc::network::tcp_listener();
    ASSERT_TRUE(fake_sentinel_rpc_server.listen(cbdc::network::localhost,
        sentinel_port));

    // Make and initialize a sentinel controller.
    auto ctl = std::make_unique<cbdc::sentinel_2pc::controller>(0,
        m_opts,
        m_logger);
    ASSERT_TRUE(ctl->init());

    // Execute a transaction.  This call succeeds despite the fact that 
    // it's impossible to obtain the 2nd attestation from the fake sentinel.
    auto res = ctl->execute_transaction(m_valid_tx, [](auto /* param */) {});
    ASSERT_TRUE(res);
}

Attestations are gathered in the method gather_attestations, which is called by execute_transaction. Stepping through gather_attestations in a debugger shows that, as expected, the number of attestations does not increase to the required number of 2 before the method returns. The output of the test to the terminal confirms this, as it doesn’t show the debug message “Accepted” that indicates sufficient attestations have been gathered for the compact transaction to be accepted. This output is triggered on Line 181 in controller.cpp in the body of gather_attestations:

m_logger->debug("Accepted", to_string(ctx.m_id));

Here’s the output from the test above:

[ RUN      ] sentinel_2pc_test.execute_transaction_without_required_attestations
[2022-07-07 00:31:22.708] [INFO ] Sentinel public key: eaa649f21f51bdbae7be4ae34ce6e5217a58fdce7f47f9aa7f3b58fa2120e2b3
[       OK ] sentinel_2pc_test.execute_transaction_without_required_attestations (6094 ms)

For comparison, here’s the output from another test in which all the attestations are successfully gathered and the debug message appears:

[ RUN      ] sentinel_2pc_test.digest_valid_transaction_direct
[2022-07-07 00:31:10.327] [INFO ] Sentinel public key: eaa649f21f51bdbae7be4ae34ce6e5217a58fdce7f47f9aa7f3b58fa2120e2b3
[2022-07-07 00:31:10.329] [DEBUG] Accepted 3ffcc8bc7153d34aaad2adbcddd64c063ff005543c85ae3ae1f88e91c3b526eb
[       OK ] sentinel_2pc_test.digest_valid_transaction_direct (5 ms)

Solution

I’m not sure what the full solution is yet, but it seems reasonable to at least allow gather_attestations to return true or false depending on whether the required number of attestations is actually obtained.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Here's a bit of background on gather_attestations(): #87 (comment)

In short, gather_attestations() returns (probably) before the attestations have actually been gathered (because it executes a set of asynchronous callbacks).

This is not my favorite construction, but the obvious alternative (having gather_attestations() block) is likely terrible for performance.