mlabs-haskell / cardanow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation of Data Cleanup Mechanisms

albertodvp opened this issue · comments

Description:
In response to the substantial amount of data produced by the service, including downloaded mithril snapshots, it has become necessary to implement robust data cleanup mechanisms. Without proper management, the accumulation of data can lead to significant disk usage growth, potentially impacting system performance and storage resources. By dividing the cleanup process into three subproblems—cleaning up local data, service-host data, and cloud data—we aim to mitigate the risk of linear growth in disk usage. This ticket outlines the implementation steps for each subproblem to ensure efficient and reliable cleanup processes.

  1. Cleaning Up Local Data:

    • Implement a script (cleanup-local-data.sh) that deletes folders prone to size growth.
    • The script should retain a specified number of files while deleting the rest.
    • Ensure the script is executable and can be manually executed.
  2. Cleaning Up Service-Host Data:

    • Develop a cleanup process to retain only the three most recent files in directories prone to size growth.
    • Create a systemd daemon that executes the cleanup script every 6 hours to ensure regular cleanup.
    • Account for system interruptions and data upload failures in the cleanup process.
  3. Cleaning Up Cloud Data:

    • Utilize a third-party API for deletion to clean up cloud data stored on another machine.
    • Similar to service-host data cleanup, retain only the three most recent files.
    • Develop a systemd daemon to execute the cleanup script every 6 hours to maintain cleanliness of cloud data.

Acceptance Criteria:

  • The cleanup-local-data.sh script should be developed and tested to ensure it deletes folders as expected, retaining the specified number of files.
  • Systemd daemons for cleaning up service-host data and cloud data should be implemented and tested to ensure they execute the cleanup scripts every 6 hours reliably.
  • Ensure the cleanup processes handle system interruptions and data upload failures gracefully, maintaining the integrity of the cleanup mechanism.
  • Provide documentation detailing the implementation steps and instructions for manual execution, if necessary.
  • Conduct thorough testing to validate the effectiveness and reliability of the cleanup mechanisms in preventing linear growth of disk usage.

Related Documentation:

  • Some documentation that describes the clean up processes should be added