Reduce memory fragmentation after big documents deletion

Question

Reduce memory fragmentation after big documents deletion

Nargonath opened this issue 5 months ago · comments

Description

Our test cluster was out of memory when we tried inserting new documents. We deleted ~150k documents. 4-5h after the deletion we still had:

[Errno 422] Rejecting write: running out of resource type: OUT_OF_MEMORY

We ran the metrics endpoint and here is the result:

{
 'system_cpu1_active_percentage': '0.00',
 'system_cpu2_active_percentage': '0.00',
 'system_cpu_active_percentage': '0.00',
 'system_disk_total_bytes': '8579448832 -> 7.99GB', 
 'system_disk_used_bytes': '346456064 -> 0.32GB',  // We have a total of 8 GB
 'system_memory_total_bytes': '454602752 -> 0.42GB',
 'system_memory_used_bytes': '308760576 -> 0.29GB',
 'system_network_received_bytes': '26288977847 -> 24.48GB',
 'system_network_sent_bytes': '15880541538 -> 14.79GB',
 'typesense_memory_active_bytes': '182366208 -> 0.17GB', // We have a total of 0.5 GB
 'typesense_memory_allocated_bytes': '92041400 -> 0.09GB',
 'typesense_memory_fragmentation_ratio': '0.50',
 'typesense_memory_mapped_bytes': '212025344 -> 0.2GB',
 'typesense_memory_metadata_bytes': '11846656 -> 0.01GB',
 'typesense_memory_resident_bytes': '182366208 -> 0.17GB',
 'typesense_memory_retained_bytes': '166510592 -> 0.16GB'
}

We can see that we have enough free space but the fragmentation is high. Could it be the high fragmentation that's causing the OOM issue?

Looking through GitHub issues and documentation, I came across the /operations/db/compact endpoint. Since the compacting process is about RocksDB I assume it wouldn't do anything regarding fragmentation, would it?

How would you go about lowering the fragmentation? Cluster restart? If so, how do you trigger it?

Steps to reproduce

N/A

Expected Behavior

Being able to insert documents into our collection after deleting a big amount of documents.

Actual Behavior

OOM issue when trying to insert documents even though we freed disk/RAM space.

Metadata

Typesense Version: v0.25.1

OS: running in Typesense Cloud so I'm not sure

Kishore Nallan · Answer 1 · Tue Feb 27 2024 19:08:03 GMT+0800 (China Standard Time)

Can you please try on v0.25.2 where we've improved this memory protection logic?

Since the compacting process is about RocksDB I assume it wouldn't do anything regarding fragmentation, would it?

This handles on-disk data storage fragmentation, which is different.

Jonas Pauthier · Answer 2 · Wed Feb 28 2024 03:46:57 GMT+0800 (China Standard Time)

Alright, thanks for the info. I'll see if we can retest it with the latest version.