microsoft / ga4gh-tes

C# implementation of the GA4GH TES API; provides distributed batch task execution on Microsoft Azure

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add TES idempotency feature

MattMcL4475 opened this issue · comments

  • Add a system-level enum setting that makes TES idempotent. Default to Disabled. Values are "Disabled", "Enabled", "EnabledWithOutputCopying"
  • Add TesTask-level setting that makes the task idempotent. Default to Disabled. Values are "Disabled", "Enabled", "EnabledWithOutputCopying". If it's set at the task level, it shall override the system setting.
  • A Tes Task shall be considered identical for the sake of idempotency, if any previous TES task has the same:
  1. Has the exact same set of Inputs (same Urls)
  2. Has the same exact values for Executors

If EnabledWithOutputCopying, then TES shall use Azure server-side blob copy to copy the previous task's outputs to the current task output's specified location(s). This work item should be added to an in-memory queue and the task state shall be set to RUNNING. It should be done in a non-blocking way from the main task status checking loop, so as not to slow down overall task throughput (Tasks can have thousands of files that need to be copied, and even though it's done server side, calling that API 1000 times will take a while). Before starting the copy, the task state shall be set to RUNNING. There shall be two separate C# HostedServices that are long-running (Created in startup.cs), one and periodically checking if all of the copies are complete; then set the task state to COMPLETE. The other should be checking if any blob copy on the file(s) is already in progress, and if not, start the copy. If TES crashes, it should be able to pickup where it left off by looping through all RUNNING tasks and resuming each one that is currently copying inputs.