OpenGen / GenSQL.query

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make VAR keyword optional in IQL-permissive

Schaechtle opened this issue · comments

Overview

The aim of this document is to spec out how to avoid using the VAR keyword.

This is the final "sprint" towards IQL-permissive, previous sprints include:

The latter is related to this issue.

Why are we doing this?

Languages out of Probcomp lab have been referred to as "stuttering". Repeating the var keyword falls into that category (e.g. GENERATE VAR x, VAR y, VAR z.

Removing the type declaration makes the language closer to natural English.

Technical approach

This approach will require context and heuristics in how events are parsed (except for GENERATE targets, those should be doable via a simple change in the grammar).

Heuristics:

  1. Table qualifiers can disambiguate -- determining where a VAR keyword would be needed in IQL-strict.
  2. If there is only one symbol in an event, add a VAR keyword
  3. If there are two or more symbols in an event add a VAR keyword to all lefthand side symbols (It's OK that this is arbitrary and might crash).

Initially, I planned that we look into meta-data about models and tables (e.g. the schema). This seems more complex both from the implementation perspective and a usability one. Now, I think we'll fare better with simple heuristics.

Examples

For the sake of readability, I am translating model expressions from query segments in permissive to query segments in strict and not ASTs to ASTs. We assume the following environment:

  • m is a model
  • d is a data table
  • foo, bar, and baz are columns in d and also column variables in m.
  • All columns are numerical! (This is different from previous issues related to IQL-permissive).
  • qux is a variable in the data but not in the model m.
  • quagga is a variable in the model but not in data table d.

Below, ➡️ means "translate AST of sub-query in IQL-permissive to IQL-strict".

Generate

GENERATE should be easy and can be done by a simple tweak to the grammar:

GENERATE foo, bar, baz UNDER m ➡️ GENERATE VAR foo, VAR bar, VAR baz UNDER m

Probability targets

Probability queries are trickier. First, I show heuristic 1 at play:
SELECT PROBABILITY OF foo=d.foo UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=d.foo UNDER m AS p, foo FROM d...

SELECT PROBABILITY OF foo=d.foo, bar=d.bar UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=d.foo AND VAR bar=d.bar UNDER m AS p, foo FROM d...

Next, heuristic 2 adds VAR keywords for all events where the binary operator contains only one variable:
SELECT PROBABILITY OF foo=42.0 UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=42. UNDER m AS p, foo FROM d...

SELECT PROBABILITY OF foo=42.0, bar=17.0 UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=d.foo AND VAR bar=d.bar UNDER m AS p, foo FROM d...

Heuristic 3 is messier:

SELECT PROBABILITY OF foo=bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=bar UNDER m AS p, foo,bar FROM d...

SELECT PROBABILITY OF foo=foo AND bar=bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=foo AND VAR bar=bar UNDER m AS p, foo,bar FROM d...

Same as above, but with , instead of AND:
SELECT PROBABILITY OF foo=foo, bar=bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=foo AND VAR bar=bar UNDER m AS p, foo,bar FROM d...

Using a binary operator other than =:
SELECT PROBABILITY OF foo>foo AND bar>bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY OF VAR foo>foo AND VAR bar>bar UNDER m AS p, foo,bar FROM d...

The following looks messy because of the foo>foo:
SELECT PROBABILITY OF foo>bar OR foo<foo UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY OF VAR foo>bar OR VAR foo<foo UNDER m AS p, foo FROM d...
If this happens to come up in any demo, I'd ask users to write foo<d.foo to disambiguate.

For operations that are more than binary, we should throw an error in IQL-permissive:
SELECT PROBABILITY OF foo<bar<baz UNDER m AS p, foo FROM d...
➡️ 💥 ERROR 💥

All of the above applies to GIVEN, too.

Non-goals

The following features for IQL-permissive will be tackled during later sprints:

  • The above doesn't have to work with WITH (if its' easy to get done, that's great; otherwise, I am fine with banning WITH from IQL-permissive.)
  • Changing the order of GIVEN - i.e. the ability to write PROBABILITY OF foo GIVEN bar UNDER model instead of PROBABILITY OF foo UNDER model GIVEN bar.

Other non-goals for now (which might become important later)

  • IQL-permissive does not need to ensure useful error messages are thrown.
  • We'll assume one schema (i.e. a single mapping from column to statistical type). In the future, different models may support different schemas.