narratorai / dbt-activity-schema

Activity Schema dbt package

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Activity Schema

This dbt package contains macros for building an activity schema.

The macros can help write models for use directly as activity stream tables, particularly if they need to be warehouse-independent.

It follows the v2.0 version of the activity schema specification.

Installation

Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages.

Include this in your packages.yml

packages:
  - package: narratorai/activity_schema
    version: [">=0.1.0", "<0.2.0"]

Usage

Call the make_activity macro in your models to get the feature_json and activity occurrence columns.

{{ config(features=['subject', 'content']) }}

with final as (
  ...
)

select * from {{ activity_schema.make_activity('final') }}

Warehouse Support

Works on the following warehouses

  • Bigquery
  • Postgres
  • Redshift
  • Snowflake

Macros

make_activity (source) This macro takes a cte name and adds the feature json and activity occurrence columns to it.

To properly add the feature_json column it takes a list of feature column names as a configuration variable.

Example usage

{{ config(features=['tag', 'subject']) }}

with final as (
  select 
    id as activity_id,
    created_at as ts ,

    'received_email' as activity,

    email as customer,
    null as anonymous_customer_id,

    subject,
    preview as content,

    null as link,
    null as revenue_impact
  from emails
)

select * from {{ activity_schema.make_activity('final') }}

feature_json (source)

Helps build a warehouse-independent feature_json column. This works by taking a list of columns containing the feature values and selecting them together into a single json object.

This isn't really necessary if your model only targets a single warehouse. It might be easier to simply write your CTE with a feature_json directly, like so (e.g. for Redshift)

object('subject', subject, 'content', preview ) as feature_json,

activity_occurrence (source) Builds the two activity occurrence columns, activity_occurrence and activity_repeated_at. These are used in the querying of an activity stream to efficiently query Nth and last activities for a given customer.


Resources:

About

Activity Schema dbt package

License:Apache License 2.0