chartbeat-labs / trepl

Generic Tiered Replication implementation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trepl

Trepl is a generic Tiered Replication (Cidon et. al) implementation, designed to help pick replica placement of Kafka partitions and configure WADE chains. However, it can be used in any situation where you might want to adjust probability of data loss / unavailability from multiple replica failures.

Tiered Replication follows up on ideas introduced in the Copysets paper, where you'll find detailed information on motivations and use cases:

Usage

Basic Trepl usage is simple:

>>> trepl.build_copysets(['node1', 'node2', 'node3'], R=2, S=1)
[['node1', 'node2'], ['node1', 'node3']]

>>> trepl.build_copysets(['node1', 'node2', 'node3'], R=2, S=2)
[['node1', 'node2'], ['node1', 'node3'], ['node2', 'node3']]

Trepl also ships with rack and tier aware check functions:

# not rack aware
>>> trepl.build_copysets(['node1', 'node2', 'node3'], R=2, S=1)
[['node1', 'node2'], ['node1', 'node3']]

# rack aware, node1 and node2 can not share a copyset since they're in
# the same rack
>>> rack_map = { 'node1': 'rack1', 'node2': 'rack1', 'node3': 'rack3' }
>>> trepl.build_copysets(
      rack_map.keys(), R=2, S=1,
      checker=trepl.checkers.rack(rack_map),
    )
[['node1', 'node3'], ['node2', 'node3']]

# scatter width must be 2, and data must exist on at least one node in
# the backup tier
>>> primary = ['A', 'B', 'C']
>>> backup = ['d', 'e']
>>> trepl.build_copysets(
      primary + backup, R=2, S=2,
      checker=trepl.checkers.tiered(backup, 2),
    )
[['A', 'd'], ['A', 'e'], ['B', 'd'], ['B', 'e'], ['C', 'd'], ['C', 'e']]

Authors

About

Generic Tiered Replication implementation.

License:Apache License 2.0


Languages

Language:Python 100.0%