rnckp / Data-Science-Projects

A collection of smaller data science projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science Projects

Collection of smaller data science projects.

Slightly edited and condensed version of one of five projects for EPFL course «Data Science: Applied Machine Learning».

  • Main techniques: EDA, data preparation and cleaning, outlier removal, regression modelling with various classifiers, scikit pipelines.

Final project for Advanced Machine Learning course at FHNW.

  • Students were pointed to the Kaggle competition and had to analyze the data, train models and submit predictions.
  • Main techniques: Regression modelling with various classifiers, scikit pipelines, LightGBM, hyperparameter tuning with GridSearch/RandomSearch, Hyperopt and scikit-optimize, category encoding, data aggregation with featuretools, analyzing feature importances with permutation, creating interaction features, oversampling with imbalanced-learn.

Quick EDA and some modelling test runs for Kaggles Spaceship Titanic Challenge.

  • Main techniques: EDA, data preparation and cleaning, classification modelling with various classifiers, scikit pipelines.

Project for course «Data Science Project Competence» at FHNW.

Quick EDA and regression modelling for a Machine Learning Lab during Data Science studies at FHNW.

  • Just the real estate data was given.
  • The project was setup as a closed Kaggle competition. Students had to compete and beat teachers' models.

Quick examination of podcast lengths to help quantify creative choices for podcast producers. I analysed ~225k episodes of ~1.8k iTunes podcasts and 37k episodes of ~800 Spotify podcasts.

Findings:

  • A prototypical length of a podcast episode is around 40 minutes.
  • 90% of all podcast episodes have a length between 20 and 60 minutes.
  • Typical lengths vary between the different genres – with median values between 15 and 65 minutes.

About

A collection of smaller data science projects


Languages

Language:Jupyter Notebook 100.0%Language:Rich Text Format 0.0%