microsoft / ga4gh-tes

C# implementation of the GA4GH TES API; provides distributed batch task execution on Microsoft Azure

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support specifying source for `allowed-vm-sizes`

davidangb opened this issue · comments

Problem:
When TES is configured to use Terra, TES reads the allowed-vm-sizes file from the containing Terra workspace storage container. Any user of the workspace who has permission to run workflows also has permission to write to that storage container; users can modify the file. Thus, the allowed-vm-sizes file cannot be used/trusted to restrict the VMs used by this TES instance.

Solution:
Terra would like to be able to specify the allowed VM sizes via an environment variable or other deploy-time config which would not be writable by end users. As discussed in person, this could be:

  1. a URL from which TES would read allowed-vm-sizes instead of - or in addition to - reading from the workspace storage container
  2. a delimited list of vm sizes which would take effect instead of - or in addition to - reading the file

We are also open to other solutions if you think an alternative is better.

Describe alternatives you've considered
We have explored Terra's ability to seed this file at TES deploy time, coupled with some kind of monitoring/checksum to ensure the file is not modified. While possible, the ROI on this approach was unattractive.

Code dependencies
Will this require code changes in:

  • CoA, for new and/or existing deployments? no
  • TES standalone, for new and/or existing deployments? ???
  • Terra, for new and/or existing deployments? yes; Leo will conditionally specify the allowed vm sizes as it deploys TES
  • Build pipeline? ???
  • Integration tests? ???

Additional context
This feature request is in support of AnVIL Lite.

[Note: Please be sure to set the appropriate label for this issue and tag contributors in the comments to start a discussion]

@davidangb We are considering an implementation where the environment variable will be a URL that will replace the current allowed-vm-sizes blob in the location determined by convention. Further, since in non-Terra deployments, the location of that blob is on a separate container from user-provided data that is able to be read-only for those users, I propose to name this variable Terra__AllowedVmSizes. I have a couple of clarifying questions to help guide me to an optimal solution:

  1. If this variable has a value, should TES fail if it cannot access it?
  2. If you intend to point to azure blob storage, should we ask WSM for a SAS token to access it?

@BMurri great questions!

  1. My $.02 is this should behave parallel to the current behavior which looks in the workspace's storage container. If it's a 404 not found, TES should continue on as if no limits were set. But, if it hits some other error like a 401/403 or a malformed file, I do think TES should fail.
  2. At this time, we don't need to request a SAS token. If we do end up hosting the file in blob storage, we'll make the file public.

thanks!

@davidangb The way this works at present is as a whitelist, with the special exception of the empty set value (which includes a missing blob or malformed content) which turns off the whitelist.

If a blob is placed by the user at the current location and we do what you suggest, the result will be the union of the two (the user can add arbitrary vmsizes/vmfamilies to your protected list) but not "trim" the list (blacklist-style), yet (I believe there's an open enhancement issue to enable that scenario).

If that's what you intend, then I'll implement it that way

ah, I must have misunderstood. If we specify a value for Terra__AllowedVmSizes, that value should take precedence. The high-level objective is to have a means to control which VMs can be used in a given deployment in a way that would not allow a workspace user to override/add to those allowances.