snowplow / dbt-snowplow-mobile

A fully incremental model, that transforms raw mobile event data generated by the Snowplow mobile trackers into a series of derived tables of varying levels of aggregation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update datediff based filters to timestamp based

bill-warner opened this issue · comments

Datediff based filters which calc the number of days between two timestamps and filter accordingly, produce counterintuitive results. An example of such a filter:

where {{ snowplow_utils.timestamp_diff('e.collector_tstamp', 'str.start_tstamp', 'day')}} <= {{ var("snowplow__max_session_days", 3) }}
and {{ snowplow_utils.timestamp_diff('e.dvce_created_tstamp', 'e.dvce_sent_tstamp', 'day') }} <= {{ var("snowplow__days_late_allowed", 3) }}

When inputting a timestamp into the datediff function while using date based datepart (year, month, week, day), the time component of the timestamp is effectively ignored. An example of the problem:

with prep as (
select cast('2021-01-03 13:00:00' as timestamp) as dvce_created_tstamp, cast('2021-01-06 13:00:10' as timestamp) as dvce_sent_tstamp 
)

select 
    dvce_created_tstamp,
    dvce_sent_tstamp,
    datediff(day, dvce_created_tstamp, dvce_sent_tstamp) as date_diff_calc,
    case when datediff(day, dvce_created_tstamp, dvce_sent_tstamp) > 3 then true else false end late_arriving_day_comparison,
    case when dvce_sent_tstamp > dateadd(day, 3, dvce_created_tstamp) then true else false end late_arriving_tstamp_comparison
    
from prep

The data point should be classified as late arriving since the dvce_sent_tstamp is more than 3 days after then dvce_created_tstamp. datediff misclassifies this whereas when comparing two timestamps (by using dateadd) we get the intended result.

Models including such filters:

  • snowplow_mobile_base_events_this_run