oxidecomputer / omicron

Omicron: Oxide control plane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need better authorization for OxQL queries

bnaecker opened this issue · comments

A prototype query language called OxQL is implemented in #5273. That PR implements the most rudimentary of authorization checks, requiring read permissions on the fleet to call its new endpoints. This issue tracks adding more robust and sensible authorization checks instead. There are a lot of pieces to this, and here a few notes.

First, Nexus may want to authorize access to a timeseries as a whole -- for example, reading physical temperature sensors may require elevated permissions. Nexus has access to the timeseries available, but would need additional metadata about the required level of permissions for each of them. That could be in static data, or part of a layer of indirection between the public timeseries and those stored in ClickHouse itself. There are other reasons to want such a layer, such as enhancing stability, and also legitimate arguments for punting (urgency around diagnosing customer issues).

Another piece of this is restricting the data within a timeseries that's visible to a customer. For example, users should not be able to see vCPU usage data for instances that they cannot otherwise access. Nexus could inspect the filters supplied in the queries, such as on things like project_id, and ensure the caller can read them. It may also want to inject its own filters into the query, to ensure that we never even read data out of ClickHouse that the user isn't authorized for.

These are just a few ideas, and there are likely many more. We'll want to flesh this issue out, and / or start an RFD, as we delve into it.

We talked about this a bit more in chat. We went over the query inspection and filter injection ideas, and @bnaecker suggested a middle ground between fully dynamic queries and on-off endpoints that seems quite practical: a silo-scoped endpoint that provides a predefined set of queries that developer users are allowed to make, and which implements custom authorization logic for each. For example, the request body type could be an enum (union) of types that look like this.

{ 
  "metric_name": "instance_cpu",
  "params": { 
    "project": "my-proj",
    "instance": "my-inst"
  } 
}

When a request comes in, we know it is a valid metric because it parses as a request body at all, and then would know for that metric that we need to authorize access to the instance by a) checking that the instance is in that project, and b) authorizing the user's access to that project.

It would use OxQL under the hood and therefore be easy to implement, and it would be easy to add new queries without cluttering up the API with a ton of endpoints, but it would save us from having to figure out how to solve authz in a general way, and it would let us learn a lot about the problem before we try to do that.