kiccho1101 / datascience-template

My template directory for data science tasks 🐹

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

datascience-template

  • Template directory for datascience competitions.
  • Data is stored in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉

Usage

Step0. Clone the repository

git clone git@github.com:kiccho1101/datascience-template.git
cd kaggle-template

Step1. Download train.csv and test.csv into ./input folder

Download them from https://www.kaggle.com/c/titanic

Step2. Run the init.sh

All you have to do is run init.sh. It will create PostgreSQL container with input data! 🎉

sh init.sh

Step3. Split tables into K-Fold

make kfold {CONFIG_NAME}

# Example
make kfold basic

Step4. Create features

make feature [FEATURE_NAME]

# Example
make feature FamilySize

Step5. Cross Validation

make cv {EXP_NAME}

# Example
make cv exp1

Step6. Create submit file

make predict {EXP_NAME}

# Example
make predict exp1

Utilities

Open pgweb

docker-compose up -d pgweb
make pgweb

Open jupyter notebook

make jupyter

Run Python script

make run python xxx.py

Format

make format

References

まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。

クラスの書き方でかなり参考にさせていただきました。

このディレクトリを作ろうと思ったきっかけになったディレクトリ。

About

My template directory for data science tasks 🐹


Languages

Language:Python 78.7%Language:Jupyter Notebook 14.3%Language:Makefile 3.5%Language:Shell 3.1%Language:Dockerfile 0.5%