nasa / opera-sds-int

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OPERA-SDS-VnV-11: Verify SDS can recover from disruptions

LucaCinquini opened this issue · comments

Split this into multiple tickets depending on what disruption is simulated

Case 1: disrupt system by killing Mozart

  • Start system
  • Run 100 jobs to completion
  • Create a backup
  • Start another 100 jobs
  • In the middle of the run, take down the Mozart machine
  • Restore the state of the system from the backup
  • Re-enable the timers and verify that they automatically re-run the same jobs

Case 2: disrupt by killing one or more of the SPOT workers

  • Verify that the jobs are automatically restarted on some other worker

Case 3: Disruptions of the DAAC services for querying metadata, downloading input data, archiving output data

  • Close the outbound ports on whatever machine is running the query, download or upload
  • Then open the ports again and verify that the system can recover