microsoft / ga4gh-tes

C# implementation of the GA4GH TES API; provides distributed batch task execution on Microsoft Azure

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Minimize the risk of expired SAS tokens AND simplify node management

BMurri opened this issue · comments

Problem:
For every work task, two things sited in azure blob storage MUST be on the node: the task runner & the task runner's task json. Those things currently must be tagged with a SAS token because they cannot be downloaded without it. Any start task that needs any resource from blob storage suffers from the same issue.

Any task added to a job cannot have its command-line changed (e.g. to update a SAS token) without first terminating and then deleting the task from the job and replacing it with a new one (which will end up at the end of the line). This is a problem when running at scale, because it is very conceivable (and actually has happened) that the token expires before the task finally starts running.

Any start tasks that must download anything requiring a SAS token have it worse, because the start task is generated at pool creation (and thus becomes a long-lived entity). In Terra today, SAS tokens live shorter lives than pools do (and a pools "lifetime" setting is the limit for new task additions to the pool's job, NOT new task STARTs). Start tasks can be updated, but that appears to require either a different batch client (with the C# library) than the one we are currently using, or a different approach to how we call the batch data-plane APIs than what we are currently doing.

Solution:

  1. As proposed in #520, load the node task runner via the startup task. As an expansion of #363, perform all start-task related work via that runner. Further, use that runner for all tasks scheduled/run on that node.
  2. Alter the runner such that it accepts from its command-line and/or environment variables all information needed to be able to generate a SAS token and download the task JSON (thus, eliminating the need for the TES server to supply any SAS token in the task command-line/task script file).

Describe alternatives you've considered
Do nothing knowing that these issues will continue to be issues, especially as environments ask for shorter SAS token lifetimes as time goes on.

Sub Tasks

Code dependencies
Will this require code changes in:

  • CoA, for new and/or existing deployments? No
  • TES standalone, for new and/or existing deployments? No
  • Terra, for new and/or existing deployments? No
  • Build pipeline? No
  • Integration tests? No

Additional context
Completing this feature will enable easier implementation and/or largely or fully complete the following issues: