dummy metrics generator
Metrics generator is built upon three main components:
- Deployment: The indexes of the table, for example:
- symbol in stock market.
- (data_center, device_id) for devices in data centers
- Static Data: Static data regarding the deployment, for example:
- model_number for a device
- score for a model
- Metrics: Continuous metrics to generate about the deployment, for example:
- cpu_utilization of a device
- price of a stock
The first step in setting up the generator is creating a deployment. Then using the deployment, you can generate static data or continuous stream of metrics.
To create a deployment from configuration you need to provide a yaml file containing the following:
deployment:
<level_name>:
faker: <faker_type>
num_items: <num_items in the level>
Where level_name
will be the name of the index, faker_type
is the name of the faker generator and num_items
is how many keys to create for this index.
Each provided level will create another num_items
instances for each entry in it's previous levels.
Example: Given the following configuration yaml file:
deployment:
device:
faker: msisdn
num_items: 2
core:
faker: msisdn
num_items: 2
and running the following command:
from metrics_gen.deployment_generator import deployment_generator
dep_gen = deployment_generator()
deployment = dep_gen.generate_deployment(configuration=configuration)
Will generate the following example deployment:
device | core | |
---|---|---|
0 | 4120271911677 | 6950611701382 |
1 | 4120271911677 | 2255426557707 |
2 | 4120271911677 | 7717168891372 |
3 | 2260158002886 | 3213635322383 |
4 | 2260158002886 | 4007792940086 |
5 | 2260158002886 | 3720953132595 |
Notice that each extra level, multiplies the number of items created by num_item
, thus we got 2 * 3 = 6 items created.
To create a static data generator you need to supply a deployment dataframe and a configuration yaml.
The static data generator knows how to generator from two kinds of feature configurations: range and choice which should be specified in the yaml.
static:
<feature_name>:
kind: range
min_range: <min_feature_range>, defaults to 0
max_range: <max_feature_range>
as_integer: <int or float>, defaults to False
<feature_name>:
kind: choice
choices: <list of possible choices>
Each provided feature will generate a new feature column in the generated dataframe.
Example: Given the following yaml:
static:
models:
kind: range
min_range: 10
max_range: 15
as_integer: True
country:
kind: choice
choices: [A, B, C, D, E, F, G]
And the previous deployment:
from metrics_gen.static_data_generator import Static_data_generator
static_data_generator = Static_data_generator(
deployment, static_configuration
)
generated_df = static_data_generator.generate_static_data()
Will generate the following dataframe:
device | core | models | country | |
---|---|---|---|---|
0 | 4120271911677 | 6950611701382 | 13 | A |
1 | 4120271911677 | 2255426557707 | 14 | C |
2 | 4120271911677 | 7717168891372 | 14 | G |
3 | 2260158002886 | 3213635322383 | 14 | G |
4 | 2260158002886 | 4007792940086 | 11 | G |
5 | 2260158002886 | 3720953132595 | 14 | D |
To create a continuous metrics stream you need to provide a deployment dataframe and metrics creation configuration yaml.
errors:
rate_in_ticks: < ~ticks between errors>
length_in_ticks: < ~length of error mode>
timestamps:
interval: <time between samples in seconds>
stochastic_interval: <create random intervals (around interval)>
metrics:
<metric name>:
accuracy: <decimals to produce>
distribution: normal
distribution_params:
mu: <mean>
noise: <noise>
sigma: <std>
is_threshold_below: <True to produce max when in error mode, False for min>
past_based_value: <True to add the latest metric to the last result (like in daily stock market), False to replace normally)
produce_max: <True for candles-like presentation>
produce_min: <True for candles-like presentation>
validation:
distribution: # per-sample validation
max: <max value for individual sample>
min: <min value for individual sample>
validate: <True to activate validation>
metric: # metric level validations
max: <max value for overall-metric sample (only applicable to past-based-values)>
min: <min value for overall-metric sample (only applicable to past-based-values)>
validate: <True to activate validation>
Each configured feature will generate additional metric for your deployment.
Example: Given the following yaml
errors: {length_in_ticks: 10, rate_in_ticks: 5}
timestamps: {interval: 5s, stochastic_interval: true}
metrics:
cpu_utilization:
accuracy: 2
distribution: normal
distribution_params: {mu: 70, noise: 0, sigma: 10}
is_threshold_below: true
past_based_value: false
produce_max: false
produce_min: false
validation:
distribution: {max: 1, min: -1, validate: false}
metric: {max: 100, min: 0, validate: true}
throughput:
accuracy: 2
distribution: normal
distribution_params: {mu: 250, noise: 0, sigma: 20}
is_threshold_below: false
past_based_value: false
produce_max: false
produce_min: false
validation:
distribution: {max: 1, min: -1, validate: false}
metric: {max: 300, min: 0, validate: true}
And the previous deployment:
from metrics_gen.metrics_generator import Generator_df
metrics_generator = Generator_df(metrics_configuration, user_hierarchy=deployment)
generator = metrics_generator.generate(as_df=True)
df = next(generator)
Will generate the following dataframe:
timestamp | core | device | cpu_utilization | cpu_utilization_is_error | throughput | throughput_is_error | is_error |
---|---|---|---|---|---|---|---|
2022-01-31 19:20:21.007087 | 2113309831673 | 4469221325973 | 100.0 | True | 0.0 | True | True |
2022-01-31 19:20:21.007087 | 2115933686087 | 4469221325973 | 100.0 | True | 235.0679405785135 | False | False |
2022-01-31 19:20:21.007087 | 0175482390171 | 4469221325973 | 70.26657388732976 | False | 208.34378630077305 | False | False |
2022-01-31 19:20:21.007087 | 1626403145660 | 4038890878426 | 59.932750968399404 | False | 217.4335871243806 | False | False |
2022-01-31 19:20:21.007087 | 7247058922310 | 4038890878426 | 83.98361382584898 | False | 265.3476318369042 | False | False |
2022-01-31 19:20:21.007087 | 7030239128061 | 4038890878426 | 100.0 | False | 225.16604191632058 | False | False |
To generate new samples all we need to do is call next(generator)
and a new sample will be generated.