⭐Applied Data Science Lab [2023]

The Applied Data Science Lab, offered by WorldQuant University, is an immersive online program that equipped me with practical skills in addressing real-world, intricate challenges.
Throughout the program, I engaged in a series of comprehensive data science projects that helped me develop proficiency in data wrangling, analysis, model-building and effective communication through hands-on experience.

Imported multiple CSV files from a private repository into a pandas DataFrame using for loops
Created preliminary and exploratory histograms, scatter plots, whisker plots and bar charts
Examined the relationship between variables by assessing Pearson correlation coefficients
Cleaned and wrangled raw data by creating a custom wrangle function

Built ML pipelines by means of Ridge, OneHotEncoder, SimpleImputer, LinearRegression and make_pipeline built-in sklearn functions
Applied L2 Regularization in order to prevent overfitting or underfitting in Linear Regression models
Created an interactive dashboard using ipywidgets library to module predictions based on different input features

Image Snapshot of Interactive Dashboard

Connected to a MongoDB server using pymongo library to localize and extract the required data, ETL.
Applied rolling average, autocorrelation and lag operations to Times Series data variables.
Utilized Train Test Split procedures to create proper train and test datasets for a Linear Regression model.
Built, explored and interpreted Partial/Auto Correlation Functions plots.
Using statsmodels modules, constructed Auto Regressive and ARMA models and validated them via Walk Forward optimization.
Tuned the number of lagged observations and moving avg. window size via GridSearchCV.
Detected an optimal balance between Model Performance and Computational Costs

Connected to a SQL database and wrangled data using magic commands and sqlite3 library
Executed randomized Train Test Split to create proper training, testing and validation datasets
Elaborated ML pipelines utilizing OrdinalEncoder, DecisionTreeClassifier, LogisticRegression and make_pipeline built-in sklearn functions
Besides computing and evaluating training and validation accuracy scores:
- For Decission Tree algorithms, tuned the Tree’s depth and assessed its predictions by assessing the Gini importance of its features
- For Logistic Regression algorithms, evaluated Odds ratios to explain its predictions
Reviewed the Ethics of Environmental and Social impact that Machine Learning models may lead to because of data biases

gathuruM / WorldQuant-Data-Science-Projects

⭐Applied Data Science Lab [2023]

About

Languages