Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of Jupyter Notebooks.
For a more contents visit https://www.christianhaller.me/
-
Equinor Volve LogML: Predicting missing geophysical logs from an open real-world dataset.
Modules: Scikit-Learn. -
Detecting Parkinson’s Disease: Classifying speech recordings of PD patients and healthy candidates.
Modules: XGBoost. -
SEG Facies Classifiacation: Training a model to predict sedimentary facies in a Kansas gas field for the classic SEG competition.
Modules: XGBoost, SciKit-Optimize. -
House Sales and Price Prediction in King County (Seattle): The project explores different house sales features and regression modelling techniques for optimizing price prediction.
Modules: SciKit-Learn. -
Permeability Prediction from Thin Sections: Evaluation of various Machine Learning/Deep Learning using Cross Validation algorithms trained on reservoir-rock thin sections. Deployment on cloud for inference.
Modules: SciKit-Learn, TensorFlow, Keras, NumPy, Pandas, Matplotlib, Seaborn. -
Sonar (chirp) Data Classification of Underwater Mines and Rocks: Train neural networks on sonar data. Prediction will distinguish two classes: rock and mine (i.e., metal surface). Use various neural network designs and a grid searches on each design to find an optimal model.
Modules: SciKit-Learn, TensorFlow, Keras, NumPy, Pandas, Matplotlib. -
Time Series Modeling - Sunspot Activity: The Sunspot Activity project examines making time series predictions using LSTM and other deep learning networks. Sunspots are dark spots on the sun, associated with lower temperature that were recorded scientifically since the 1700s.
Modules: TensorFlow, Keras, NumPy, Pandas, Matplotlib. -
Brent Crude Oil price prediction with LSTM: Price time-series modeling with LSTM and comparison of performance of Mean Absolute Error and Mean Square Error.
Modules: TensorFlow, Keras.
-
Three-Way Sentiment Analysis for Twitter Tweets: Twitter sentiment analysis (positive, negative, neutral) classification model for tweets, without using NLTK's sentiment analysis engine.
Modules: NLTK, SciKit-Learn. -
Two-Way IMDB Film Database Sentiment Analysis: Analyze 25,000 movie reviews in IMDB if positive (1) or negative (0) sentiment with a relatively simple LSTM (Recurrent Neural Network).
Modules: TensorFlow, Keras. -
Medical Chatbot with NLTK: Ingest communication and responses to train an NLP model. Then implement a GUI to make inferences and interact with the catbot and get respones.
Modules: NLTK, TensorFlow, Keras, Numpy, tkinter.
-
Smart AirBnB booking in Berlin (dataset 2020-08-30): Analysis of the price variability in Berlin's AirBnB listings scraped in August 2020. A huge data set. Which district, which amenities, and what time of the year are best value?.
Modules: Pandas, Matplotlib, Seaborn, Scikit-Learn. -
Exploring US Economic Data with a Dashboard: This project intents to visualize simple time-series data in a dashboard and makes it permanently available in an S3 bucket.
Modules: Bokeh. -
Toronto Neighborhoods Analysis: The project explores the Wikipedia data on Toronto (Canada) neighbourhoods with the post code M and will create labelled, interactive maps.
Modules: Pandas, BeautifulSoup, Folium. -
Shopping Mall development in Charlotte, North Carolina, U.S.A.: City and Foursquare data on shopping malls is compared and knn-clustered by shopping-mall density per neighborhood to offer insight where new shopping malls may be a good fit.
Modules: Pandas, BeautifulSoup, SciKit-Learn, Foursquare-API, ESRI geocoding, Folium. -
Market Analysis for Tech Stocks: Ingest, visualize, evaluate risk, and Monte-Carlo simulate prices for some technology stocks: Apple, Google, Microsoft, Amazon.
Modules: Pandas, Numpy, Matplotlib, Seaborn. -
911 Calls Exploration (dataset 2020-07-29): This exploration will analyze the emergency call (911) dataset from Kaggle containing Fire, Traffic, Emergency Medical Services (EMS) incidents for Montgomery County, Pennsylvania.
Modules: Pandas, Matplotlib, Seaborn.
-
A simple model for Dogs vs. Cats Convolutional Neuronal Network (CNN) Part I: A 2,000 images dataset of cat and dogs pictures from Kaggle trained for 2-category classification.
Modules: TensorFlow, Keras. -
A simple model for Dogs vs. Cats Convolutional Neuronal Network (CNN) Part II: This project expands on Part I with transfer learning from publicly-available neural net Inception V3.
Modules: TensorFlow, Keras. -
Drowsiness Detection in Real Time: Detect closed eyes in real-time video footage, such as a webcam feed.
Modules: OpenCV. -
Image Mulitclass Classification in TensorFlow: Rock - Paper - Scissors: A deep-learning model predicts if you are showing it a Rock-hand, Paper-hand, or Scissor-hand. Trained with computer-generated image data.
Modules: TensorFlow, Keras, Matplotlib.