kaggle-base
- Template directory for datascience competitions.
- Data is saved in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉
Usage
Step0. Clone the repository
git clone https://github.com/kiccho1101/kaggle-base.git
cd kaggle-base
Step1. Pull/Build Docker image
Recommended:
make pull
or
make build
Step2. Start up jupyter notebook
make jupyter
- Copy token and acccess to localhost:${JUPYTER_PORT} (default: 9000)
Step3. Start up DB
make start-db
- Then you can access to localhost:${PGWEB_PORT} (default: 9002) to view the database.
Step4. Split train data into K-fold
make kfold CONFIG_NAME(default: lightgbm_0)
Step5. Create Features
- Create all features.
make feature
- Specify a feature that will be created.
make feature FEATURE_NAME
Step6. Cross Validation
make cv CONFIG_NAME
Step7. Create Stats of each table
make stats
Step8. Train and Predict
make train-and-predict CONFIG_NAME
Step9. Submit
- Then submit your output file!🙆
./output/submission_xxx.csv
Commands
isort, black
make format
flake8, mypy
make check
Reset DB
make reset-db
execute scripts
Recommended:
make shell
python xxx.py
or
make run python xxx.py
References
1.データ分析コンペにおいて 特徴量管理に疲弊している全人類に伝えたい想い
まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。
2.Kaggleで使えるFeather形式を利用した特徴量管理法
クラスの書き方が参考になります。