maZymaZe / Sherman

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Sherman is a B+Tree on disaggregated memory; it uses one-sided RDMA verbs to perform all index operations. Sherman includes three techniques to boost write performance:

  • A hierarchical locks leveraging on-chip memory of RDMA NICs.
  • Coalescing dependent RDMA commands
  • Two-level version layout in leaf nodes

For more details, please refer to our paper:

[SIGMOG'22] Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory. Qing Wang and Youyou Lu and Jiwu Shu.

System Requirements

  1. Mellanox ConnectX-5 NICs and above
  2. RDMA Driver: MLNX_OFED_LINUX-4.7-3.2.9.0 (If you use MLNX_OFED_LINUX-5**, you should modify codes to resolve interface incompatibility)
  3. NIC Firmware: version 16.26.4012 and above (to support on-chip memory, you can use ibstat to obtain the version)
  4. memcached (to exchange QP information)
  5. cityhash
  6. boost 1.53 (to support boost::coroutines::symmetric_coroutine)

Setup about RDMA Network

  1. You can modify this line according the RDMA NIC you want to use, where ibv_get_device_name(deviceList[i]) is the name of RNIC (e.g., mlx5_0) https://github.com/thustorage/Sherman/blob/9bb950887cd066ebf4f906edbb43bae8e728548d/src/rdma/Resource.cpp#L28
  2. If you use RoCE, you should modify gidIndex in this line according to the shell command show_gids, which is usually 3. https://github.com/thustorage/Sherman/blob/c5ee9d85e090006df39c0afe025c8f54756a7aea/include/Rdma.h#L60
  3. Change the constant kLockChipMemSize in include/Commmon.h, making it <= max size of on-chip memory.

Getting Started

  • cd Sherman
  • ./script/hugepage.sh to request huge pages from OS (use ./script/clear_hugepage.sh to return huge pages)
  • mkdir build; cd build; cmake ..; make -j
  • cp ../script/restartMemc.sh .
  • configure ../memcached.conf, where the 1st line is memcached IP, the 2nd is memcached port

For each run with kNodeCount servers:

  • ./restartMemc.sh (to initialize memcached server)
  • In each server, execute ./benchmark kNodeCount kReadRatio kThreadCount

We emulate each server as one compute node and one memory node: In each server, as the compute node, we launch kThreadCount client threads; as the memory node, we launch one memory thread. kReadRatio is the ratio of get operations.

In ./test/benchmark.cpp, we can modify kKeySpace and zipfan, to generate different workloads. In addition, we can open the macro USE_CORO to bind kCoroCnt coroutine on each client thread.

TODO

  • Re-write delete operations

About

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory


Languages

Language:C++ 92.8%Language:C 4.2%Language:CMake 1.3%Language:Python 1.2%Language:Shell 0.4%