Make VAR keyword optional in IQL-permissive
Schaechtle opened this issue · comments
Overview
The aim of this document is to spec out how to avoid using the VAR keyword.
This is the final "sprint" towards IQL-permissive, previous sprints include:
The latter is related to this issue.
Why are we doing this?
Languages out of Probcomp lab have been referred to as "stuttering". Repeating the var keyword falls into that category (e.g. GENERATE VAR x, VAR y, VAR z
.
Removing the type declaration makes the language closer to natural English.
Technical approach
This approach will require context and heuristics in how events are parsed (except for GENERATE
targets, those should be doable via a simple change in the grammar).
Heuristics:
- Table qualifiers can disambiguate -- determining where a VAR keyword would be needed in IQL-strict.
- If there is only one symbol in an event, add a VAR keyword
- If there are two or more symbols in an event add a VAR keyword to all lefthand side symbols (It's OK that this is arbitrary and might crash).
Initially, I planned that we look into meta-data about models and tables (e.g. the schema). This seems more complex both from the implementation perspective and a usability one. Now, I think we'll fare better with simple heuristics.
Examples
For the sake of readability, I am translating model expressions from query segments in permissive to query segments in strict and not ASTs to ASTs. We assume the following environment:
m
is a modeld
is a data tablefoo
,bar
, andbaz
are columns ind
and also column variables inm
.- All columns are numerical! (This is different from previous issues related to IQL-permissive).
qux
is a variable in the data but not in the modelm
.quagga
is a variable in the model but not in data tabled
.
Below, ➡️ means "translate AST of sub-query in IQL-permissive to IQL-strict".
Generate
GENERATE
should be easy and can be done by a simple tweak to the grammar:
GENERATE foo, bar, baz UNDER m
➡️ GENERATE VAR foo, VAR bar, VAR baz UNDER m
Probability targets
Probability queries are trickier. First, I show heuristic 1 at play:
SELECT PROBABILITY OF foo=d.foo UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=d.foo UNDER m AS p, foo FROM d...
SELECT PROBABILITY OF foo=d.foo, bar=d.bar UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=d.foo AND VAR bar=d.bar UNDER m AS p, foo FROM d...
Next, heuristic 2 adds VAR keywords for all events where the binary operator contains only one variable:
SELECT PROBABILITY OF foo=42.0 UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=42. UNDER m AS p, foo FROM d...
SELECT PROBABILITY OF foo=42.0, bar=17.0 UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=d.foo AND VAR bar=d.bar UNDER m AS p, foo FROM d...
Heuristic 3 is messier:
SELECT PROBABILITY OF foo=bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=bar UNDER m AS p, foo,bar FROM d...
SELECT PROBABILITY OF foo=foo AND bar=bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=foo AND VAR bar=bar UNDER m AS p, foo,bar FROM d...
Same as above, but with ,
instead of AND
:
SELECT PROBABILITY OF foo=foo, bar=bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY DENSITY OF VAR foo=foo AND VAR bar=bar UNDER m AS p, foo,bar FROM d...
Using a binary operator other than =
:
SELECT PROBABILITY OF foo>foo AND bar>bar UNDER m AS p, foo,bar FROM d...
➡️ SELECT PROBABILITY OF VAR foo>foo AND VAR bar>bar UNDER m AS p, foo,bar FROM d...
The following looks messy because of the foo>foo
:
SELECT PROBABILITY OF foo>bar OR foo<foo UNDER m AS p, foo FROM d...
➡️ SELECT PROBABILITY OF VAR foo>bar OR VAR foo<foo UNDER m AS p, foo FROM d...
If this happens to come up in any demo, I'd ask users to write foo<d.foo
to disambiguate.
For operations that are more than binary, we should throw an error in IQL-permissive:
SELECT PROBABILITY OF foo<bar<baz UNDER m AS p, foo FROM d...
➡️ 💥 ERROR
💥
All of the above applies to GIVEN
, too.
Non-goals
The following features for IQL-permissive will be tackled during later sprints:
- The above doesn't have to work with WITH (if its' easy to get done, that's great; otherwise, I am fine with banning
WITH
from IQL-permissive.) - Changing the order of
GIVEN
- i.e. the ability to writePROBABILITY OF foo GIVEN bar UNDER model
instead ofPROBABILITY OF foo UNDER model GIVEN bar
.
Other non-goals for now (which might become important later)
- IQL-permissive does not need to ensure useful error messages are thrown.
- We'll assume one schema (i.e. a single mapping from column to statistical type). In the future, different models may support different schemas.