Reduce memory fragmentation after big documents deletion
Nargonath opened this issue · comments
Description
Our test cluster was out of memory when we tried inserting new documents. We deleted ~150k documents. 4-5h after the deletion we still had:
[Errno 422] Rejecting write: running out of resource type: OUT_OF_MEMORY
We ran the metrics
endpoint and here is the result:
{
'system_cpu1_active_percentage': '0.00',
'system_cpu2_active_percentage': '0.00',
'system_cpu_active_percentage': '0.00',
'system_disk_total_bytes': '8579448832 -> 7.99GB',
'system_disk_used_bytes': '346456064 -> 0.32GB', // We have a total of 8 GB
'system_memory_total_bytes': '454602752 -> 0.42GB',
'system_memory_used_bytes': '308760576 -> 0.29GB',
'system_network_received_bytes': '26288977847 -> 24.48GB',
'system_network_sent_bytes': '15880541538 -> 14.79GB',
'typesense_memory_active_bytes': '182366208 -> 0.17GB', // We have a total of 0.5 GB
'typesense_memory_allocated_bytes': '92041400 -> 0.09GB',
'typesense_memory_fragmentation_ratio': '0.50',
'typesense_memory_mapped_bytes': '212025344 -> 0.2GB',
'typesense_memory_metadata_bytes': '11846656 -> 0.01GB',
'typesense_memory_resident_bytes': '182366208 -> 0.17GB',
'typesense_memory_retained_bytes': '166510592 -> 0.16GB'
}
We can see that we have enough free space but the fragmentation is high. Could it be the high fragmentation that's causing the OOM issue?
Looking through GitHub issues and documentation, I came across the /operations/db/compact
endpoint. Since the compacting process is about RocksDB I assume it wouldn't do anything regarding fragmentation, would it?
How would you go about lowering the fragmentation? Cluster restart? If so, how do you trigger it?
Steps to reproduce
N/A
Expected Behavior
Being able to insert documents into our collection after deleting a big amount of documents.
Actual Behavior
OOM issue when trying to insert documents even though we freed disk/RAM space.
Metadata
Typesense Version: v0.25.1
OS: running in Typesense Cloud so I'm not sure
Can you please try on v0.25.2 where we've improved this memory protection logic?
Since the compacting process is about RocksDB I assume it wouldn't do anything regarding fragmentation, would it?
This handles on-disk data storage fragmentation, which is different.
Alright, thanks for the info. I'll see if we can retest it with the latest version.