NethermindEth / juno

Starknet client implementation.

Home Page:https://juno.nethermind.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OOM Crashes on Juno Pod After Restart During Heavy Load

wojciechos opened this issue · comments

Increased traffic targeting the starknet_call method on our k8s pod pushed CPU usage to 100%, leading to request failures and block sync issues. Subsequent restarts of the pod resulted in immediate OOM errors at startup. However, after applying a fresh database, the pod started to sync properly without any OOM issues which suggests that db has been corrupted(?).

image
k8s Logs:

terminated
Reason: OOMKilled - exit code: 137
Started at: 2024-04-19T15:14:04+05:30
Finished at: 2024-04-19T15:14:51+05:30

Possible Causes:

  • Potential database corruption during restarts combined with high CPU load.
  • Recent Pebble updates

//UPDATE - 06.05.2024
Pod unable to keep up with syncing, resulting in failed requests due to reaching CPU limit.
Actions taken: Added more pods, restarted pod, but no improvement.
Resolution: Removing and replacing the DB resolved the issue.
Next steps: Prioritize investigating and fixing the underlying cause.

06-05-2024-incident.pdf