open-policy-agent / contrib

Integrations, examples, and proof-of-concepts that are not part of OPA proper.

Home Page:http://www.openpolicyagent.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BUG/ENHANCEMENT: Alternate SQL forms in data_filter_example

anadon opened this issue · comments

At Arroyo, we're working with PostgreSQL, and the generated SQL queries relying on "INNER JOIN" are both invalid and highly sub-optimal when tweaked. Rather, the qualifiers should be using a 'WHERE' clause. In the mean time, we'll probably just be using an obvious internal hack, but this should be handled.

SELECT ipv4net.* FROM ipv4net INNER JOIN gateway INNER JOIN arroyoroles ON ('test_admin' = arroyoroles.name AND arroyoroles.role = 'ADMIN' AND ipv4net.gateway_name = ateway.name AND arroyoroles.org_id = gateway.org_id ) ;
to
SELECT * FROM ipv4net where ('test_admin' = arroyoroles.name AND arroyoroles.role = 'ADMIN' AND ipv4net.gateway_name = gateway.name AND arroyoroles.org_id = gateway.org_id ) ;

The example is implemented against SQLite which requires explicit joins. It's actually simpler in the Postgres case because we don't need special logic to generate joins vs. where clauses. I'm tempted to just put an obvious note in the example README about this. WDYT?

I think this is an opportunity to think about dialect support. That said, I don't think I can say which way this should be done. I'm thinking on it. A note in the meantime in the README would be pragmatic.

One of the more optimal ways for SQL would be to generate the following:

SELECT * FROM (SELECT * FROM table1 WHERE 'test_admin' = table1.name AND table1.role = 'ADMIN') AS table1 WHERE ( table1.org_id = (SELECT org_id FROM table2 WHERE table2.id = (SELECT gw_id FROM table3)));

This is really an area where AST transforms would be helpful.

It looks like AST conversion involves a complete set of hand made transformation motifs.

OK, I've hit a wall on trying to hack my way around this one.

#Specific IPSecTunnel request
query_ipsec_allow {
  perms = data.arroyoroles[_]
  perms.name = input.sid
  perms.role = "ADMIN"
  ipsec_row = data.ipsec[_]
  function_row = data.function[_]
  ipsec_row.id = function_row.id
  gateway_row = data.gateway[_]
  function_row.gw_id == gateway_row.id

  perms.org_id = gateway_row.org_id

  ipsec_row.name = input.pk
}


#List IPSecTunnel request
query_ipsec_allow {
  perms = data.arroyoroles[_]
  perms.name = input.sid
  perms.role = "ADMIN"
  ipsec_row = data.ipsec[_]
  function_row = data.function[_]
  ipsec_row.id = function_row.id
  gateway_row = data.gateway[_]
  function_row.gw_id == gateway_row.id

  perms.org_id = gateway_row.org_id

  input.pk = ""
}

transforms to

SELECT * FROM arroyoroles INNER JOIN gateway INNER JOIN ipsec INNER JOIN function ON ('test_admin' = arroyoroles.name AND arroyoroles.role = 'ADMIN' AND ipsec.id = function.id AND function.gw_id = gateway.id AND arroyoroles.org_id = gateway.org_id AND '' = ipsec.name) UNION SELECT * FROM arroyoroles INNER JOIN gateway INNER JOIN ipsec INNER JOIN function ON ('test_admin' = arroyoroles.name AND arroyoroles.role = 'ADMIN' AND ipsec.id = function.id AND function.gw_id = gateway.id AND arroyoroles.org_id = gateway.org_id);

in cockroachdb yields invalid syntax: statement ignored: syntax error at or near "union". Hand editing to the following hints closer to the core issue:

>SELECT * FROM arroyoroles WHERE ('test_admin' = arroyoroles.name AND arroyoroles.role = 'ADMIN' AND ipsec.id = function.id AND function.gw_id = gateway.id AND arroyoroles.org_id = gateway.org_id AND '' = ipsec.name) UNION SELECT * FROM arroyoroles WHERE ('test_admin' = arroyoroles.name AND arroyoroles.role = 'ADMIN' AND ipsec.id = function.id AND function.
gw_id = gateway.id AND arroyoroles.org_id = gateway.org_id);
pq: no data source matches prefix: ipsec

At this point, it appears that the core issue is that tables must be explicitly brought in by a query, and that operations on joining or checking according to tables must take place with the first and most recent table when evaluating a subclause. As a consequence, all queries which involve 3 or more tables are either syntactically invalid or cause an evaluation error.

I don't know enough about cockroachdb to answer this properly. One thing that jumped out was whether the FROM table is correct? From the example it seems the FROM table should be ipsec?

One thing I'll add is that the policy above is expressing an OR condition: query_ipsec_allow is true if body #1 is satisfied OR body #2 is satisfied. One thing we could change in the example is the translation code that decides whether to generate multiple SELECTs that are UNION-ed, or, a single SELECT with a WHERE clause that contains an OR condition. The reason for the UNION was that SQLite didn't support implicit JOINs in the WHERE clause--so we implemented the translation to generate INNER JOINs when necessary.

Cockroachdb is effectively PostgreSQL in this context. While messing with origin table some could make some of these work, the core issue is the relying on 3+ tables to make a decision. They can't all be shunted off to the end. Once that gets manually fixed, the UNION works correctly. But this needs to be handled automatically and robustly. In particular, this use case is a big one and is a barrier to wider adoption.

At Arroyo, we're going to be shelving OPA adoption until this kind of area matures. It can be through integrating automatic querying of external data sources, or general case SQL generation with dialect support.

Thanks for the update! I'll put a note in the README about SQLite and close this.

the core issue is the relying on 3+ tables to make a decision.

That seems unavoidable given the current approach and your data model. One alternative is to implement an abstraction layer between the policy and policy and the DB tables. This would insulate the policy from the underlying schema (which has a few benefits.)

I'll follow-up if there are improvements in this area.

Thing is, we have other integration with how tables are generated, accessed, and otherwise maintained that get in the way of adding another abstraction layer. We aren't programming gods, so there are always a few things we could be missing, but the work was already getting fragile.

We'd love to keep up with changes!