Resume investigation i using foundationdb as persistence service

Question

Resume investigation i using foundationdb as persistence service

masih opened this issue 10 months ago · comments

Masih H. Derkani commented 10 months ago

We recently experienced a suspected memory leak issue with our FoundationDB deployment in both development and production environments, running version 7.1.33. This issue manifested as all storage servers consistently consuming 100% of the memory after approximately two weeks of operation, leading to their frequent shutdown by the scheduler. This deployment utilized the RocksDB storage engine, which has been known to present memory leak problems in previous versions of FoundationDB. However, it remains uncertain if these issues persist in the version we deployed.

To address the problem, we rejuvenated the production cluster and upgraded to the latest pre-release version of FoundationDB available at the time, version 7.3.7. This process involved migrating the data to new storage servers, and we also planned to explore other, newer storage engines like Redwood.

Two weeks post-upgrade, we observed reduced memory usage on some but not all storage servers, and the data migration was still incomplete. These findings suggest that data migration can be a lengthy process in FoundationDB. Since data migration consumes read bandwidth, we're currently unsure how this would affect read performance in a live production setup, particularly when FoundationDB is in the read traffic path.

Due to the time-consuming nature of further investigation, we decided to temporarily shut down the deployments in both development and production environments. Despite these issues, FoundationDB demonstrated significant potential. With a replication factor of two, we achieved a multihash ingest rate as high as 250K per second, which is the highest we have recorded, compared to non-replicated current Pebble backends.

Once time permits, we intend to revisit FoundationDB for further testing and potential use.

For instructions on restarting the instances in the Kubernetes (K8S) setup, please refer to the shutdown PR:

#2166
#2171