Activity 1.2 Storage
mkaranasou opened this issue · comments
Make storage more efficient, especially regarding the features.
-
Measure current space features take up
-
Check for Postgres native enhancements/ solutions
-
Add
id_group
to therequest_sets
table and to the partitions (this should be used to reference external storage in case we end up using something like the next: ) -
TileDb
-- In MB:
sum(pg_column_size('id')) / 1024 /1024 = 79
sum(pg_column_size('id_runtime')) / 1024 /1024 = 290
sum(pg_column_size('target')) / 1024 /1024 = 184
sum(pg_column_size('ip')) / 1024 /1024 = 79
sum(pg_column_size('ip_encrypted')) / 1024 /1024 = 343
sum(pg_column_size('ip_iv')) / 1024 /1024 = 158
sum(pg_column_size('ip_tag')) / 1024 /1024 = 184
sum(pg_column_size('start')) / 1024 /1024 = 158
sum(pg_column_size('stop')) / 1024 /1024 = 132
sum(pg_column_size('total_seconds')) / 1024 /1024 = 369
sum(pg_column_size('subset_count')) / 1024 /1024 = 343
sum(pg_column_size('num_request_sets')) / 1024 /1024 = 449
sum(pg_column_size('time_bucket')) / 1024 /1024 = 316
sum(pg_column_size('label')) / 1024 /1024 = 158
sum(pg_column_size('id_attribute')) / 1024 /1024 = 343
sum(pg_column_size('id_banjax')) / 1024 /1024 = 264
sum(pg_column_size('process_flag')) / 1024 /1024 = 343
sum(pg_column_size('prediction')) / 1024 /1024 = 290
sum(pg_column_size('score')) / 1024 /1024 = 158
sum(pg_column_size('prediction_host')) / 1024 /1024 = 442
sum(pg_column_size('score_host')) / 1024 /1024 = 290
sum(pg_column_size('row_num')) / 1024 /1024 = 211
sum(pg_column_size('features')) / 1024 /1024 = 237
sum(pg_column_size('created_at')) / 1024 /1024 = 290
sum(pg_column_size('updated_at')) / 1024 /1024 = 290
sum(pg_column_size('model_version')) / 1024 /1024 = 369
https://github.com/postgrespro/zson
SELECT pg_column_size('features') FROM request_sets;
VACUUM (VERBOSE, ANALYZE, FULL) request_sets;
https://docs.scylladb.com/kb/scylla-and-spark-integration/ (poor reads)
https://riak.com/products/riak-ts/resiliency/index.html?p=10650.html (availability)
https://godatadriven.com/blog/monitoring-hbase-with-prometheus/
https://hbase.apache.org
https://cwiki.apache.org/confluence/display/Hive/Home
select
pg_size_pretty(sum(pg_column_size('features'))) as total_size,
pg_size_pretty(avg(pg_column_size('features'))) as average_size,
sum(pg_column_size('features')) * 100.0 / pg_total_relation_size('request_sets_y2020_w25') as percentage
from request_sets_y2020_w25;
"238 MB" "9.0000000000000000 bytes" "0.88090080647908018376"