≡ Trade.C++

A transactional locking implementation for C++. Based on basic ideas from

Transactional Locking II: Dave Dice, Ori Shalev, and Nir Shavit

this simple variation uses a hash table of locks, a splay tree to maintain an access set, and attempts locks in address order.

In a nutshell one could describe transactional locking as allowing you to write concurrent and parallel programs roughly as easily as if you only had a single global mutex and a single global condition variable while providing a performance profile more like that of highly fine-grained locking.

The goal of this library is to provide an API that makes transactional locking work like a natural part of C++ and a portable implementation that can be used today for prototyping parallel algorithms using transactional memory.

See synopsis.hpp for the API and queue_tm.hpp for a transactional queue implementation written for testing purposes.

≡ Overview

≡ Building

This project uses the C++ submodule manager. If you have the cppsm command in path, you can just clone and test this project:

cppsm clone "git@github.com:per-framework/trade.cpp.git" v1
cd trade.cpp
cppsm test

Without the C++ submodule manager you need to non-recursively update the submodules after cloning and use CMake to e.g. generate makefiles and use make to build and test:

git clone git@github.com:per-framework/trade.cpp.git
cd trade.cpp
git submodule update --init
mkdir .build
cd .build
cmake ..
make all test

If you are not using cppsm in your project and just want to use Trade.C++, you should be able to just

add_subdirectory(path-to/trade.cpp)

and then

target_link_libraries(your-target PRIVATE trade_v1)

in your CMake files.

≡ Basics

Using this library one stores values in atoms that can then be accessed from any number of threads transactionally within atomically blocks.

For example, given

atom<int> xA = 1;
atom<int> yA = 2;

we could write

int sum = atomically([&]() {
  int x = xA;
  int y = yA;
  yA = x;
  xA = y;
  return x + y;
});

to atomically, with respect to other transactions, swap the values of xA and yA and compute their sum. Loads of and stores to atoms can only be made inside atomically blocks.

≡ Ref

As a convenience and optimization a mutable reference to the current value of an atom within a transaction can be obtained with ref().

For example, one could write

atomically([&]() { counter.ref() += 1; });

to increment the value of a counter atom.

The returned reference is only valid within the transaction and can be used multiple times.

≡ Side-effects

The action given to atomically may be invoked many times. Therefore it is important that the action does not perform any side-effects that must not be repeated.

On the other hand, the action will only be invoked from the thread that started the transaction. So, it is entirely possible to use side-effects safely within the action.

For example, given a queue with transactional empty predicate and pop_front operation, one could write

std::vector<Value> values;

atomically([&]() {
  values.clear();
  while (!queue.empty())
    values.push_back(queue.pop_front());
});

to transactionally pop the values off of the queue and push them into a vector. It is important that the values vector is cleared at the beginning of the action.

≡ Nesting

atomically blocks can be nested and thereby transactions composed.

For example, given a queue with transactional try_pop_front and push_back operations, code such as

bool moved = atomically([&]() {
  if (auto opt_v = q1.try_pop_front()) {
    q2.push_back(opt_v.value());
    return true;
  }
  return false;
});

could transactionally move an element from one queue to another such that no other transaction may observe a state where the element is not in either queue.

≡ Blocking

An atomically block can call retry at any point to stop running the transaction and block waiting for other threads to make changes to atoms read during the transaction. Once such changes have been made, the transaction will be restarted.

For example, given a queue with a transactional try_pop_front operation returning an optional value, one could write

auto value = atomically([&]() {
  if (auto opt_value = queue.try_pop_front())
    return opt_value.value();
  retry();
});

to wait until a value can be obtained from the queue.

≡ Memory management

Care must be taken when dynamically allocated memory is accessed within transactions — objects that contain atoms must not be deallocated prematurely.

To support dynamic memory management, loads within transactions take and keep a copy of the original value in the atom. This allows smart pointers such as std::shared_ptr to be used for memory management as long as there is a suitable partial specialization of std::atomic.

For example, a transactional dynamic queue data structure could be defined with the help of std::shared_ptr as

template <class Value> class queue_tm {
  struct node_t {
    atom<std::shared_ptr<node_t>> m_next;
    Value m_value;
    node_t(const Value &value) : m_value(value) {}
  };

  atom<std::shared_ptr<node_t>> m_first;
  atom<std::shared_ptr<node_t>> m_last;

  // ...
};

A push_back operation for queues could then be implemented as

void push_back(const value_t &value) {
  std::shared_ptr<node_t> node(new node_t(value));
  atomically([&]() {
    if (auto prev = m_last.load())
      prev->m_next = m_last = node;
    else
      m_first = m_last = node;
  });
}

and a pop_front operation as

value_t pop_front() {
  return atomically([&]() {
    if (auto first = m_first.load()) {
      if (auto next = first->m_next.load())
        m_first = next;
      else
        m_last = m_first = nullptr;
      return node->m_value;
    }
    retry();
  });
}

≡ Readonly transactions

Transactions that are readonly and do not write into atoms can be executed more efficiently without creating a transaction log. To run a readonly transaction, simply pass assume_readonly to atomically.

For example, to compute the size of a transactional queue, one could write

size_t size() {
  return atomically(assume_readonly, [&]() {
    size_t n = 0;
    for (auto node = m_first.load(); node; node = node->m_next)
      n += 1;
    return n;
  });
}

Note that algorithmically the above may not be a good idea as it goes through the entire queue.

≡ Stack or heap allocation

Loads and stores of atoms inside transactions create a transaction log. The memory for the log can be allocated either from a fixed size buffer on the stack or dynamically from the heap.

By default the log is allocated from the stack with a capacity that should be sufficient for most common use cases where a small number of atoms are accessed. When necessary, pass to atomically either heap(initial_size) to specify heap allocation or stack<fixed_size> to specify stack allocation.

For example, one could specify stack<64> to execute a tiny transaction as

atomically(stack<64>, [&]() { atom = 101; });

and avoid wasting stack.

In case a transaction runs out of log space, the transaction will be aborted by raising the std::bad_alloc exception.

Note that only the allocation configuration of the outermost atomically call is considered in nested transactions.

≡ Atomic types only

The type argument T of atom<T> must also be allowed as an argument to std::atomic. Unless the type T is TriviallyCopyable and std::atomic<T> is not always lock-free, an atom stores the value in a std::atomic<T>. This allows efficient implementation of the necessary atomicity guarantees when multiple threads may simultaneously access the value stored in an atom.

≡ Exceptions

Invalid accesses and retry raise exceptions. User code inside transactions should let such exceptions fall through to the handler in the outermost atomically block. The action given to atomically may throw other exceptions — such exceptions will be allowed to fall through and abort the transaction.

≡ Trade-offs

A portable implementation usable today with any C++17 compiler. If necessary, it should be fairly easy to support plain C++11.
As in the TL2 algorithm,
- a shared global clock is used, which may cause significant contention for tiny transactions,
- does not require use of special heap for transactional memory, although user code is required to cooperate to avoid premature deallocations, and
- user code cannot observe inconsistent memory states.
The use of hashed locks may cause false sharing (as multiple atoms may hash to a single lock) and in the unlikely worst case performance may degrade to the point where it is equivalent to having a single global lock.
The hash computation adds some overhead to every access.
On the other hand, the size of an atom<T> is not larger than the size of std::atomic<T>, which is practically optimal.
Adds a retry capability for blocking. This also takes minor advantage of the use of hashed locks.
Every access of an atom potentially involves loading a thread-local variable. Ideally a good compiler would be able to eliminate many thread-local accesses, but that does not seem to happen. This adds a certain amount of overhead to accesses.
The use of exceptions for aborting transactions may be expensive and may also prohibit or complicate use in cases like embedded systems where exception handling might be turned off to reduce code size.

per-framework / trade.cpp