deflect-ca / baskerville

Security Analytics Engine - Anomaly Detection in Web Traffic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Activity 1.2 Storage

mkaranasou opened this issue · comments

Make storage more efficient, especially regarding the features.

  • Measure current space features take up

  • Check for Postgres native enhancements/ solutions

  • Add id_group to the request_sets table and to the partitions (this should be used to reference external storage in case we end up using something like the next: )

  • TileDb

-- In MB:
sum(pg_column_size('id')) / 1024 /1024 = 79
sum(pg_column_size('id_runtime')) / 1024 /1024 = 290
sum(pg_column_size('target')) / 1024 /1024 = 184
sum(pg_column_size('ip')) / 1024 /1024 = 79

sum(pg_column_size('ip_encrypted')) / 1024 /1024 = 343
sum(pg_column_size('ip_iv')) / 1024 /1024 = 158
sum(pg_column_size('ip_tag')) / 1024 /1024  = 184

sum(pg_column_size('start')) / 1024 /1024 = 158
sum(pg_column_size('stop')) / 1024 /1024 = 132
sum(pg_column_size('total_seconds')) / 1024 /1024 = 369
sum(pg_column_size('subset_count')) / 1024 /1024 = 343
sum(pg_column_size('num_request_sets')) / 1024 /1024 = 449
sum(pg_column_size('time_bucket')) / 1024 /1024 = 316
sum(pg_column_size('label')) / 1024 /1024 = 158
sum(pg_column_size('id_attribute')) / 1024 /1024 = 343
sum(pg_column_size('id_banjax')) / 1024 /1024 = 264

sum(pg_column_size('process_flag')) / 1024 /1024 = 343

sum(pg_column_size('prediction')) / 1024 /1024 = 290
sum(pg_column_size('score')) / 1024 /1024 = 158

sum(pg_column_size('prediction_host')) / 1024 /1024 = 442
sum(pg_column_size('score_host')) / 1024 /1024 = 290
sum(pg_column_size('row_num')) / 1024 /1024 = 211

sum(pg_column_size('features')) / 1024 /1024 = 237
sum(pg_column_size('created_at')) / 1024 /1024 = 290
sum(pg_column_size('updated_at')) / 1024 /1024 = 290
sum(pg_column_size('model_version')) / 1024 /1024 = 369
select
    pg_size_pretty(sum(pg_column_size('features'))) as total_size,
    pg_size_pretty(avg(pg_column_size('features'))) as average_size,
    sum(pg_column_size('features')) * 100.0 / pg_total_relation_size('request_sets_y2020_w25') as percentage
from request_sets_y2020_w25;
"238 MB"	"9.0000000000000000 bytes"	"0.88090080647908018376"