cleysonl / nba-monte-carlo

Monte Carlo simulation of the NBA season, leveraging meltano, dbt, duckdb and superset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MDS in a box

This project serves as end to end example of running the "Modern Data Stack" in a local environment. For those looking for a more integrated experience, devcontainers have been implemented as well. If you have docker and WSL installed, the container can booted up right from VS Code.

Current progress

Right now, you can get the nba schedule and elo ratings from this project and generate the following query. more to come, see to-dos at bottom of readme. And of course, the dbt docs are self hosted in Github Pages, check them out here. image image

Getting started - Windows

  1. Create your WSL environment. Open a PowerShell terminal running as an administrator and execute:
wsl --install
  • If this was the first time WSL has been installed, restart your machine.
  1. Open Ubuntu in your terminal and update your packages.
sudo apt-get update
  1. Install python3.
sudo apt-get install python3.8 python3-pip python3.8-venv
  1. clone the this repo.
mkdir meltano-projects
cd meltano-projects
git clone https://github.com/matsonj/nba-monte-carlo.git
# Go one folder level down into the folder that git just created
cd nba-monte-carlo
  1. build your project & run your pipeline
make build
make pipeline
  1. Connect duckdb to superset. first, create an admin users
meltano invoke superset:create-admin
  • then boot up superset
meltano run superset:ui
  • lastly, connect it to duck db. navigate to localhost:8088, login, and add duckdb as a database.

    • SQL Alchemy URL: duckdb:////tmp/mdsbox.db

    • Advanced Settings > Other > Engine Parameters: {"connect_args":{"read_only":true}}

  1. Explore your data inside superset. Go to SQL Labs > SQL Editor and write a custom query. A good example is SELECT * FROM reg_season_end.

Running your pipeline on demand

After your run make pipeline, you can run your pipeline again at any time with the following meltano command:

meltano run tap-spreadsheets-anywhere target-duckdb --full-refresh dbt-duckdb:build

Using Parquet instead of a database

There is an additional target in the meltano.yml file as well as dbt profiles.yml file that allows use of parquet as a storage medium. This can be invoked with make parquet. This is experimental and implementation will evolve over time.

Todos

  • replace reg season schedule with 538 schedule
  • add table for results
  • add config options in dbt vars to ignore completed games
  • make simulator only sim incomplete games
  • add table for new ratings
  • add config to use original or new ratings

Optional stuff

  • add dbt tests
  • add model descriptions
  • change elo calculation to a udf
  • make playoff elimination stuff a macro (param: schedule type)

Source Data

The data contained within this project comes from 538, basketball reference, and draft kings.

About

Monte Carlo simulation of the NBA season, leveraging meltano, dbt, duckdb and superset


Languages

Language:Makefile 100.0%