bg-mohamed / RFS677-Y

Personal GitHub to host and shares my academic mini-projects related to my master degree.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hi there πŸ––, I'm m0ham3d

Geologist / Petroleum engineer / Data Enthusiast

Recently reconversed from Oil&Gas field, I've never coded before 04/2021. The last time I coded some lines was with TurboPascal πŸ‘΄ (I know I'm old) when I was in my final high school year 2002 and with C & Matlab when I was on the preparatory classes for Engineering school 2003

I've started this github in order to host and shares my academic mini-projects related to my master degree.

My very first project was a kind of familiarization with Python coding and was a PWVAP calculation of 10 bitcoins exchanges

πŸ“ƒ Requirements

πŸ’° PWVAP


The second project is related to Data management, Data processing & cleaning, Data viz, and an optional section of prediction using Logistic regression

πŸ“ƒ Requirements

πŸ“Š Data Management


Then came statistics project at the end of Statistics-1 module, and the project include a statistical descriptive analysis + CAPM/Fama&French 3 and 5 factors application on 10 stocks exchanges from DowJones

πŸ“ƒ Requirements

πŸ“ˆ Stats project


The Text mining project was also a cool one, the topic was Tweets scraping with keywords: "European super League", Text processing, Data viz and Text classification using unsupervised algorithms. I used K-Means, Topic Modeling Latent Dirichlet Allocation & NMF

πŸ“ƒ Requirements

πŸ“‹ Text mining


The Time Series project tackled the COV-19 cases & confirmed deaths stats: The targeted countries were the European ones, The main parts of the projects were EDA and outliers treatments wich were mainly due to double tests PCR/Antigenic for cases numbers and wrong classification of deaths causes. A Data Viz part which highlighted a couple of statistical rates that explains differents aspects of countries reactions to the pandemic, then a modelisation part with Random walk,ARMA,SARIMAX and XGBoost Regressor applied on time Series.

πŸ“ƒ Requirements

⏳ Time Series


The Machine Learning project used an open source dataset from Kaggle "Are you Gonna be Hired?" which was shuffled, modified & changed prior given to usage in the project, the main objective was a binary classification of the target: Hired = 1 / Not Hired = 0. One of the conditions in this project was to keep all the Test dataset complete without dropping Nans or reshuffling. The principle I followed in this project, was to focus on EDA and dealing with NaNs & Outliers, then after some features engineering, choosing a couple of Classification Algorithms and select the best model based on ROC/AUC score, models used : Logistic Regeression, KNN, Random Forest, Gradient Boosting, XGBoost, SVC. Features selection was also applied after tunning Hyperparameters.

πŸ“ƒ Requirements

πŸ€– Machine Learning


The Last project in this degree is a Computer vision multiclass classification,based on a TF Keras backend, we used an open source dataset from Zalando "Fashion MNIST". In the project, I used a DNN / CNN & Transfer learning models (Inception V3 & VGG19), tunning hyperparameters with Keras Tuner & Hyperband, an optionnal XAI part was added with SamplingExplainer of SHAP package but the methodology can be improved.

πŸ“ƒ Requirements

🧠 Deep Learning


  

Languages and Tools:

azure django html5 linux matlab mssql photoshop python pytorch scikit_learn tensorflow

Contacts:

github linkedin YouTube kaggle

Profile views