support specifying source for `allowed-vm-sizes`

Question

support specifying source for `allowed-vm-sizes`

davidangb opened this issue 23 days ago · comments

Problem:
When TES is configured to use Terra, TES reads the allowed-vm-sizes file from the containing Terra workspace storage container. Any user of the workspace who has permission to run workflows also has permission to write to that storage container; users can modify the file. Thus, the allowed-vm-sizes file cannot be used/trusted to restrict the VMs used by this TES instance.

Solution:
Terra would like to be able to specify the allowed VM sizes via an environment variable or other deploy-time config which would not be writable by end users. As discussed in person, this could be:

a URL from which TES would read allowed-vm-sizes instead of - or in addition to - reading from the workspace storage container
a delimited list of vm sizes which would take effect instead of - or in addition to - reading the file

We are also open to other solutions if you think an alternative is better.

Describe alternatives you've considered
We have explored Terra's ability to seed this file at TES deploy time, coupled with some kind of monitoring/checksum to ensure the file is not modified. While possible, the ROI on this approach was unattractive.

Code dependencies
Will this require code changes in:

CoA, for new and/or existing deployments? no
TES standalone, for new and/or existing deployments? ???
Terra, for new and/or existing deployments? yes; Leo will conditionally specify the allowed vm sizes as it deploys TES
Build pipeline? ???
Integration tests? ???

Additional context
This feature request is in support of AnVIL Lite.

[Note: Please be sure to set the appropriate label for this issue and tag contributors in the comments to start a discussion]

Blair L Murri · Answer 1 · Sat Jul 20 2024 01:18:17 GMT+0800 (China Standard Time)

@davidangb We are considering an implementation where the environment variable will be a URL that will replace the current allowed-vm-sizes blob in the location determined by convention. Further, since in non-Terra deployments, the location of that blob is on a separate container from user-provided data that is able to be read-only for those users, I propose to name this variable Terra__AllowedVmSizes. I have a couple of clarifying questions to help guide me to an optimal solution:

If this variable has a value, should TES fail if it cannot access it?
If you intend to point to azure blob storage, should we ask WSM for a SAS token to access it?

David An · Answer 2 · Mon Jul 22 2024 23:13:08 GMT+0800 (China Standard Time)

@BMurri great questions!

My $.02 is this should behave parallel to the current behavior which looks in the workspace's storage container. If it's a 404 not found, TES should continue on as if no limits were set. But, if it hits some other error like a 401/403 or a malformed file, I do think TES should fail.
At this time, we don't need to request a SAS token. If we do end up hosting the file in blob storage, we'll make the file public.

thanks!

Blair L Murri · Answer 3 · Tue Jul 23 2024 00:10:55 GMT+0800 (China Standard Time)

@davidangb The way this works at present is as a whitelist, with the special exception of the empty set value (which includes a missing blob or malformed content) which turns off the whitelist.

If a blob is placed by the user at the current location and we do what you suggest, the result will be the union of the two (the user can add arbitrary vmsizes/vmfamilies to your protected list) but not "trim" the list (blacklist-style), yet (I believe there's an open enhancement issue to enable that scenario).

If that's what you intend, then I'll implement it that way

David An · Answer 4 · Tue Jul 23 2024 04:19:21 GMT+0800 (China Standard Time)

ah, I must have misunderstood. If we specify a value for Terra__AllowedVmSizes, that value should take precedence. The high-level objective is to have a means to control which VMs can be used in a given deployment in a way that would not allow a workspace user to override/add to those allowances.