yunqiqiliang / dbt-snowflake-summit-2023-hands-on-lab-snowpark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Title: Python Snowpark Formula1

Work in Progress (WIP) Notice

We will be iteratively updating this project for code cleanup, automation, and developing best practices. So far the list of future improvements is as follows:

  • setup folder for connecting to public s3 bucket
  • google drive link to our guide (will be subsequently replaced by dbt ecosystems page)
  • yaml selectors for training and prediction; prediction only
  • codegen for the 8 staging files
  • label encorder clean up for numeric variables
  • ohe for the categorical variables
  • multi-class accuracy
  • trying out https://github.com/omnata-labs/dbt-ml-preprocessing for some of preprocessing?

Project Description

A repo using open source Formula1 to show how dbt cloud combines 1) SQL and python 2) analytics and machine learning (ml). We are able to blend these together seamlessly using Snowpark for python on Snowflake.

How to Run the Project

Placeholder for the guide link. The script to connect to the data is placed in the setup folder.

Credits Placeholder

About


Languages

Language:Python 100.0%