Collection of smaller data science projects.
Slightly edited and condensed version of one of five projects for EPFL course «Data Science: Applied Machine Learning».
- Main techniques: EDA, data preparation and cleaning, outlier removal, regression modelling with various classifiers, scikit pipelines.
Final project for Advanced Machine Learning course at FHNW.
- Students were pointed to the Kaggle competition and had to analyze the data, train models and submit predictions.
- Main techniques: Regression modelling with various classifiers, scikit pipelines, LightGBM, hyperparameter tuning with GridSearch/RandomSearch, Hyperopt and scikit-optimize, category encoding, data aggregation with featuretools, analyzing feature importances with permutation, creating interaction features, oversampling with imbalanced-learn.
Quick EDA and some modelling test runs for Kaggles Spaceship Titanic Challenge.
- Main techniques: EDA, data preparation and cleaning, classification modelling with various classifiers, scikit pipelines.
Project for course «Data Science Project Competence» at FHNW.
- Just the data was given. Students were asked to analyse the data, present insights and propose appropriate data products. In addition to that I created a working data dashboard with Streamlit.
Quick EDA and regression modelling for a Machine Learning Lab during Data Science studies at FHNW.
- Just the real estate data was given.
- The project was setup as a closed Kaggle competition. Students had to compete and beat teachers' models.
Quick examination of podcast lengths to help quantify creative choices for podcast producers. I analysed ~225k episodes of ~1.8k iTunes podcasts and 37k episodes of ~800 Spotify podcasts.
Findings:
- A prototypical length of a podcast episode is around 40 minutes.
- 90% of all podcast episodes have a length between 20 and 60 minutes.
- Typical lengths vary between the different genres – with median values between 15 and 65 minutes.