kiccho1101 / kaggle-base

Template for Datascience Competitions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kaggle-base

  • Template directory for datascience competitions.
  • Data is saved in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉

Usage

Step0. Clone the repository

git clone https://github.com/kiccho1101/kaggle-base.git
cd kaggle-base

Step1. Pull/Build Docker image

Recommended:

make pull

or

make build

Step2. Start up jupyter notebook

make jupyter
  • Copy token and acccess to localhost:${JUPYTER_PORT} (default: 9000)

Step3. Start up DB

make start-db
  • Then you can access to localhost:${PGWEB_PORT} (default: 9002) to view the database.

Step4. Split train data into K-fold

make kfold CONFIG_NAME(default: lightgbm_0)

Step5. Create Features

  • Create all features.
make feature
  • Specify a feature that will be created.
make feature FEATURE_NAME

Step6. Cross Validation

make cv CONFIG_NAME

Step7. Create Stats of each table

make stats

Step8. Train and Predict

make train-and-predict CONFIG_NAME

Step9. Submit

  • Then submit your output file!🙆
./output/submission_xxx.csv

Commands

isort, black

make format

flake8, mypy

make check

Reset DB

make reset-db

execute scripts

Recommended:

make shell
python xxx.py

or

make run python xxx.py

References

1.データ分析コンペにおいて 特徴量管理に疲弊している全人類に伝えたい想い

まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。

2.Kaggleで使えるFeather形式を利用した特徴量管理法

クラスの書き方が参考になります。

3.flowlight0's directory

4.upura's directory

About

Template for Datascience Competitions


Languages

Language:Jupyter Notebook 91.5%Language:Python 7.6%Language:Dockerfile 0.5%Language:Makefile 0.3%Language:Shell 0.0%