sodadata / docs

Soda Documentation, served at docs.soda.io

Home Page:https://docs.soda.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wrong example of freshness check variable name

geertjan-garvis opened this issue · comments

On: https://docs.soda.io/soda-cl/freshness.html
Example given to check against a variable given at scan time:

checks for dim_product:
  - freshness(end_date, ${CUST_VAR}) < 1d

this fails for me, following does work:

checks for dim_product:
  - freshness(end_date, CUST_VAR) < 1d

Hi @geertjan-garvis,
We have ourselves a puzzle here. Oddly, I seem to get opposite results. (I am using Soda Core 3.0.13.)
a) I was able to get the following to work for me, providing the value for VAR at scan time as end_date for the column name:

checks for dim_product:
  - freshness(${VAR}) < 1d

But this threw an error:

checks for dim_product:
  - freshness(VAR) < 1d
[12:56:06] Soda Core 3.0.13
[12:56:07] Query execution error in adventureworks.dim_product.aggregation[0]: column "var" does not exist
LINE 2:   MAX(VAR) 
              ^

SELECT 
  MAX(VAR) 
FROM public.dim_product
  | column "var" does not exist
  | LINE 2:   MAX(VAR) 
  |               ^

[12:56:07] Metrics 'max' were not computed for check 'freshness(VAR) < 1d'
[12:56:07] Scan summary:
[12:56:07] 1/1 check NOT EVALUATED: 
[12:56:07]     dim_product in adventureworks
[12:56:07]       freshness(VAR) < 1d [NOT EVALUATED]
[12:56:07] 1 checks not evaluated.
[12:56:07] 2 errors.
[12:56:07] Oops! 2 errors. 0 failures. 0 warnings. 0 pass.

b) I cannot get a check with multiple columns identified to work! I have an internal Jira logged for this issue as it came up a few weeks ago. Are you able to see that return check results for each column??
For example, this errored for me:

checks for dim_product:
  - freshness(start_date, end_date) < 1d
[12:54:46] Could not parse variable end_date as a timestamp: variable not found
[12:54:46] Scan summary:
[12:54:46] 1/1 check FAILED: 
[12:54:46]     dim_product in adventureworks
[12:54:46]       freshness(start_date, end_date) < 1d [FAILED]
[12:54:46] 1 errors.
[12:54:46] Oops! 1 error. 1 failures. 0 warnings. 0 pass.

Could you confirm which version of Soda Core you are using, please?

Hi Janet, I will check this weekend in more detail, but one immediate difference that I see between your example and mine:

 checks for dim_product:
  - freshness(end_date, CUST_VAR) < 1d 

Here, I'm talking about the second variable to freshness(). As far as I can tell, it takes two, of different type:

freshness(<column name>, <(optional) value of now>)

The second parameter is a timestamp against which the freshness is compared. It's not another column, it's a literal timestamp value (e.g. "2022-11-25T10:00") or a variable that has this timestamp as its value. And this is the one this example refers to and where i'm finding i need to pass either a literal string or a variable name without the $ { } , whereas the documented example uses ${CUST_VAR} which does not work for me.

In your example, you're trying to pass a variable for the first parameter, the column name:

checks for dim_product:
  - freshness(${VAR}) < 1d

I'm guessing these two are parsed differently. But let me check this weekend to make sure. I'm running our own fork of 3.0.13, but in our fork we have not changed anything related to this. I will double check this weekend with a plain soda installation.

So with respect to your question:
b) I cannot get a check with multiple columns identified to work!

I don't think that's meant to be supported. freshness takes one column and optionally one "value of now"

Oh, you're right. In my haste, I had completely forgotten about that value-of-now variable. I will retest here and correct as needed.
Re: b) I could swear that I had gotten that to work in the past but of course memory is fallible. I'm going to keep my internal ticket open to double confirm.