kpradyumna095 / Snowflake_ML_Intro

Introduction to performing Machine Learning on Snowflake

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Snowflake for Data Science

Getting Started

๐ŸŽฅ Intro Video Walkthrough: Snowflake for ML Intro
๐ŸŽฅ Advanced MLops Video Walkthrough: Snowflake for MLOps
๐Ÿ”— Regular 30-Day Trial: Sign Up
๐Ÿ”— Student/Educator 120-Day Trial: Sign Up (Student)

Configuration Setup

  1. Create a .env file and populate it with your account details:

    SNOWFLAKE_ACCOUNT = abc123.us-east-1
    SNOWFLAKE_USER = username
    SNOWFLAKE_PASSWORD = yourpassword
    SNOWFLAKE_ROLE = sysadmin
    SNOWFLAKE_WAREHOUSE = compute_wh
    SNOWFLAKE_DATABASE = snowpark
    SNOWFLAKE_SCHEMA = titanic
    
  2. Utilize the environment.yml file to set up your Python environment for the demo:

    • Examples in the terminal:
      • conda env create -f environment.yml
      • micromamba create -f environment.yml -y

Data Processing & ML Operations

Load & Transform Data

Execute the load_data notebook to accomplish the following:

  • Load the Titanic dataset from Seaborn, convert to uppercase, and save as CSV
  • Upload the CSV file to a Snowflake Internal Stage
  • Create a Snowpark DataFrame from the staged CSV
  • Write the Snowpark DataFrame to Snowflake as a table

Machine Learning Operations (snowml)

In the snowml notebook:

  • Generate a Snowpark DataFrame from the Titanic table
  • Validate and handle null values
  • Remove columns with high null counts and correlations
  • Adjust Fare datatype and impute categorical nulls
  • One-Hot Encode Categorical Values
  • Segregate data into Test & Train sets
  • Train an XGBOOST Classifier Model with hyperparameter tuning
  • Conduct predictions on the test set
  • Display Accuracy, Precision, and Recall metrics

Advanced MLOps with Live/Batch Inference & Streamlit

Following the load_data steps, utilize the deployment notebook to:

  • Create a Snowpark DataFrame from the Titanic table
  • Assess and eliminate columns with high null counts and correlated columns
  • Adjust Fare datatype and handle categorical nulls
  • One-Hot Encode Categorical Values
  • Split the data into Test & Train sets
  • Train an XGBOOST Classifier Model, optimizing with grid search
  • Display model accuracy and best parameters
  • Register the model in the model registry
  • Deploy the model as a vectorized UDF (User Defined Function)
  • Execute batch predictions on a table
  • Perform real-time predictions using Streamlit for interactive inference

About

Introduction to performing Machine Learning on Snowflake


Languages

Language:Jupyter Notebook 93.5%Language:Python 6.5%