data-science docker kaggle postgres python

kaggle-base

Template directory for datascience competitions.
Data is saved in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉

Usage

Step0. Clone the repository

git clone https://github.com/kiccho1101/kaggle-base.git
cd kaggle-base

Step1. Pull/Build Docker image

Recommended:

make pull

or

make build

Step2. Start up jupyter notebook

make jupyter

Copy token and acccess to localhost:${JUPYTER_PORT} (default: 9000)

Step3. Start up DB

make start-db

Then you can access to localhost:${PGWEB_PORT} (default: 9002) to view the database.

Step4. Split train data into K-fold

make kfold CONFIG_NAME(default: lightgbm_0)

Step5. Create Features

Create all features.

make feature

Specify a feature that will be created.

make feature FEATURE_NAME

Step6. Cross Validation

make cv CONFIG_NAME

Step7. Create Stats of each table

make stats

Step8. Train and Predict

make train-and-predict CONFIG_NAME

Step9. Submit

Then submit your output file!🙆

./output/submission_xxx.csv

Commands

isort, black

make format

flake8, mypy

make check

Reset DB

make reset-db

execute scripts

Recommended:

make shell
python xxx.py

or

make run python xxx.py

References

1.データ分析コンペにおいて特徴量管理に疲弊している全人類に伝えたい想い

まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。

2.Kaggleで使えるFeather形式を利用した特徴量管理法

クラスの書き方が参考になります。

3.flowlight0's directory

4.upura's directory

About

Template for Datascience Competitions

data-science docker kaggle postgres python

Languages

Language:Jupyter Notebook 91.5%Language:Python 7.6%Language:Dockerfile 0.5%Language:Makefile 0.3%Language:Shell 0.0%

kaggle-base

Usage

Step0. Clone the repository

Step1. Pull/Build Docker image

Step2. Start up jupyter notebook

Step3. Start up DB

Step4. Split train data into K-fold

Step5. Create Features

Step6. Cross Validation

Step7. Create Stats of each table

Step8. Train and Predict

Step9. Submit

Commands

isort, black

flake8, mypy

Reset DB

execute scripts

References

1.データ分析コンペにおいて 特徴量管理に疲弊している全人類に伝えたい想い

2.Kaggleで使えるFeather形式を利用した特徴量管理法

3.flowlight0's directory

4.upura's directory

About

Languages

1.データ分析コンペにおいて特徴量管理に疲弊している全人類に伝えたい想い