Service Capacity Modeling

A generic toolkit for modeling capacity requirements in the cloud. Pricing information included in this repository are public prices.

NOTE: Netflix confidential information should never enter this repo. Please consider this repository public when making changes to it.

Trying it out

Run the tests:

# Test the capacity planner on included netflix models
$ tox -e py38

# Run a single test with a debugger attached if the test fails
$ .tox/py38/bin/pytest -n0 -k test_java_heap_heavy --pdb --pdbcls=IPython.terminal.debugger:Pdb

# Verify all type contracts
$ tox -e mypy

Run IPython for interactively using the library:

tox -e dev -- ipython

Example of Provisioning a Database

Fire up ipython and let's capacity plan a Tier 1 (important to the product aka "prod") Cassandra database.

from service_capacity_modeling.interface import CapacityDesires
from service_capacity_modeling.interface import FixedInterval, Interval
from service_capacity_modeling.interface import QueryPattern, DataShape

db_desires = CapacityDesires(
    # This service is important to the business, not critical (tier 0)
    service_tier=1,
    query_pattern=QueryPattern(
        # Not sure exactly how much QPS we will do, but we think around
        # 10,000 reads and 10,000 writes per second.
        estimated_read_per_second=Interval(
            low=1000, mid=10000, high=100000, confidence=0.9
        ),
        estimated_write_per_second=Interval(
            low=1000, mid=10000, high=100000, confidence=0.9
        ),
    ),
    # Not sure how much data, but we think it'll be below 1 TiB
    data_shape=DataShape(
        estimated_state_size_gib=Interval(low=100, mid=100, high=1000, confidence=0.9),
    ),
)

Now we can load up some models and do some capacity planning

from service_capacity_modeling.capacity_planner import planner
from service_capacity_modeling.models.org import netflix
import pprint

# Load up the Netflix capacity models
planner.register_group(netflix.models)

cap_plan = planner.plan(
    model_name="org.netflix.cassandra",
    region="us-east-1",
    desires=db_desires,
    # Simulate the possible requirements 512 times
    simulations=512,
    # Request 3 diverse hardware families to be returned
    num_results=3,
)

# The range of requirements in hardware resources (CPU, RAM, Disk, etc ...)
requirements = cap_plan.requirements

# The ordered list of least regretful choices for the requirement
least_regret = cap_plan.least_regret

# Show the range of requirements for a single zone
pprint.pprint(requirements.zonal[0].dict(exclude_unset=True))

# Show our least regretful choices of hardware in least regret order
# So for example if we can buy the first set of computers we would prefer
# to do that but we might not have availability in that family in which
# case we'd buy the second one.
for choice in range(3):
    num_clusters = len(least_regret[choice].candidate_clusters.zonal)
    print(f"Our #{choice + 1} choice is {num_clusters} zones of:")
    pprint.pprint(least_regret[choice].candidate_clusters.zonal[0].dict(exclude_unset=True))

Note that we can customize more information given what we know about the use case, but each model (e.g. Cassandra) supplies reasonable defaults.

For example we can specify a lot more information

db_desires = CapacityDesires(
    # This service is important to the business, not critical (tier 0)
    service_tier=1,
    query_pattern=QueryPattern(
        # Not sure exactly how much QPS we will do, but we think around
        # 50,000 reads and 45,000 writes per second with a rather narrow
        # bound
        estimated_read_per_second=Interval(
            low=40_000, mid=50_000, high=60_000, confidence=0.9
        ),
        estimated_write_per_second=Interval(
            low=42_000, mid=45_000, high=50_000, confidence=0.9
        ),
        # This use case might do some partition scan queries that are
        # somewhat expensive, so we hint a rather expensive ON-CPU time
        # that a read will consume on the entire cluster.
        estimated_mean_read_latency_ms=Interval(
            low=0.1, mid=4, high=20, confidence=0.9
        ),
        # Writes at LOCAL_ONE are pretty cheap
        estimated_mean_write_latency_ms=Interval(
            low=0.1, mid=0.4, high=0.8, confidence=0.9
        ),
        # We want single digit latency, note that this is not a p99 of 10ms
        # but defines the interval where 98% of latency falls to be between
        # 0.4 and 10 milliseconds. Think of:
        #   low = "the minimum reasonable latency"
        #   high = "the maximum reasonable latency"
        #   mid = "value between low and high such that I want my distribution
        #          to skew left or right"
        read_latency_slo_ms=FixedInterval(
            low=0.4, mid=4, high=10, confidence=0.98
        ),
        write_latency_slo_ms=FixedInterval(
            low=0.4, mid=4, high=10, confidence=0.98
        )
    ),
    # Not sure how much data, but we think it'll be below 1 TiB
    data_shape=DataShape(
        estimated_state_size_gib=Interval(low=100, mid=500, high=1000, confidence=0.9),
    ),
)

Example of provisioning a caching cluster

In this example we tweak the QPS up, on CPU time of operations down and SLO down. This more closely approximates a caching workload

cache_desires = CapacityDesires(
    service_tier=1,
    query_pattern=QueryPattern(
        # Not sure exactly how much QPS we will do, but we think around
        # 10,000 reads and 10,000 writes per second.
        estimated_read_per_second=Interval(
            low=10_000, mid=100_000, high=1_000_000, confidence=0.9
        ),
        estimated_write_per_second=Interval(
            low=1_000, mid=20_000, high=100_000, confidence=0.9
        ),
        # Memcache is consistently fast at queries
        estimated_mean_read_latency_ms=Interval(
            low=0.05, mid=0.2, high=0.4, confidence=0.9
        ),
        estimated_mean_write_latency_ms=Interval(
            low=0.05, mid=0.2, high=0.4, confidence=0.9
        ),
        # Caches usually have tighter SLOs
        read_latency_slo_ms=FixedInterval(
            low=0.4, mid=0.5, high=5, confidence=0.98
        ),
        write_latency_slo_ms=FixedInterval(
            low=0.4, mid=0.5, high=5, confidence=0.98
        )
    ),
    # Not sure how much data, but we think it'll be below 1000
    data_shape=DataShape(
        estimated_state_size_gib=Interval(low=100, mid=200, high=500, confidence=0.9),
    ),
)

cache_cap_plan = planner.plan(
    model_name="org.netflix.cassandra",
    region="us-east-1",
    desires=cache_desires,
    allow_gp2=True,
)

requirement = cache_cap_plan.requirement
least_regret = cache_cap_plan.least_regret

Notebooks

We have a demo notebook in notebooks you can use to experiment. Start it with

tox -e notebook jupyter notebook notebooks/demo.ipynb

Development

To contribute to this project:

Make your change in a branch. Consider making a new model if you are making significant changes and registering it as a different name.
Write a unit test using pytest in the tests folder.
Ensure your tests pass (or debug them) with:

tox -e py38 -- -k test_<your_functionality> --pdb --pdbcls=IPython.terminal.debugger:Pdb

Release

TODO

neelabalan / service-capacity-modeling