Gizachew29 / Data-Science-Portofolio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science Portfolio

  • Utilized statistical methods to find the significant differences in the modulation of brain cardiovascular pulse with respiration between controls and Alzheimer’s cases to help the neurological researchers to have a better understanding of Alzheimer’s diseases. The differences found were strongly significant (P<0.01) and novel.
  • Preprocessed and extracted features using a 3D multiresolution optical flow of 0.25 TB complex brain imaging data using python.
  • Publication: Youssef Hosni, Ahmed Elabasy et.al., Respiration modulates cardiovascular brain impulse pathology in Alzheimer’s disease. Submitted to Journal of Cerebral Blood Flow and Metabolism.
  • Publication: Ahmed Elabasy, Youssef Hosni et.al., Optical Flow Analysis of Propagating Respiratory Brain Pulsations. Submitted to IEEE Transactions on Medical Imaging.

Machine Learning:

Regression

  • Automobile price prediction: Utlitize python to implement end to end data science pipeline to predict the price of old Automobile based on the given features.

Classification

  • Sensor Activity Recogniation: Classifying the output of eight sensors into five activities and studied the effect of changing window sizes and axel combination.
  • Alzhimers CV-BOLD Classification: Utilized Python to develop supervised machine learning techniques to classify imbalanced Alzheimer’s CVBOLD data, which enhanced the classification performance by 10%.

Clustering


Deep Learning


Computer Vision


Natural Language Processing

  • Sentiment Analysis web app: Web application for classification of reviews, using deep learning model implemented in PyTorch and deployed on Amazon SageMaker.
  • Plagirasm Detector web app: Creating plagiarism detector trained on LSC and containments features and deployed on AWS SageMaker.
  • Data Science Resume Selector: Selecting the resume that are eligbile to data scientist postions, the dataset used contains 125 resumes, in the resumetext column. Resumes were queried from Indeed.

Time series Analysis


Data Analysis


Data Visulization:


Spark


Data Modeling

  • Songs App User Activity Data Modeling : Modeling user activity data for a music streaming app called Sparkify to optimize queries for understanding what songs users are listening to by creating a Postgres relational database and ETL pipeline to build up Fact and Dimension tables and insert data into new tables.
  • Songs App data modeling using Apache Casandra: Create an Apache Cassandra database which can create queries on song play data to answer analysis questions.

Certificates


Course Work

About


Languages

Language:Jupyter Notebook 96.8%Language:Python 3.0%Language:MATLAB 0.1%Language:HTML 0.0%Language:M 0.0%