ChristianHallerX / Analytics_Projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science Projects

Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of Jupyter Notebooks.

For a more contents visit https://www.christianhaller.me/

Contents

Machine Learning

  • Equinor Volve LogML: Predicting missing geophysical logs from an open real-world dataset.
    Modules: Scikit-Learn.

  • Detecting Parkinson’s Disease: Classifying speech recordings of PD patients and healthy candidates.
    Modules: XGBoost.

  • SEG Facies Classifiacation: Training a model to predict sedimentary facies in a Kansas gas field for the classic SEG competition.
    Modules: XGBoost, SciKit-Optimize.

  • House Sales and Price Prediction in King County (Seattle): The project explores different house sales features and regression modelling techniques for optimizing price prediction.
    Modules: SciKit-Learn.

  • Permeability Prediction from Thin Sections: Evaluation of various Machine Learning/Deep Learning using Cross Validation algorithms trained on reservoir-rock thin sections. Deployment on cloud for inference.
    Modules: SciKit-Learn, TensorFlow, Keras, NumPy, Pandas, Matplotlib, Seaborn.

  • Sonar (chirp) Data Classification of Underwater Mines and Rocks: Train neural networks on sonar data. Prediction will distinguish two classes: rock and mine (i.e., metal surface). Use various neural network designs and a grid searches on each design to find an optimal model.
    Modules: SciKit-Learn, TensorFlow, Keras, NumPy, Pandas, Matplotlib.

  • Time Series Modeling - Sunspot Activity: The Sunspot Activity project examines making time series predictions using LSTM and other deep learning networks. Sunspots are dark spots on the sun, associated with lower temperature that were recorded scientifically since the 1700s.
    Modules: TensorFlow, Keras, NumPy, Pandas, Matplotlib.

  • Brent Crude Oil price prediction with LSTM: Price time-series modeling with LSTM and comparison of performance of Mean Absolute Error and Mean Square Error.
    Modules: TensorFlow, Keras.

Natural Language Processing

  • Three-Way Sentiment Analysis for Twitter Tweets: Twitter sentiment analysis (positive, negative, neutral) classification model for tweets, without using NLTK's sentiment analysis engine.
    Modules: NLTK, SciKit-Learn.

  • Two-Way IMDB Film Database Sentiment Analysis: Analyze 25,000 movie reviews in IMDB if positive (1) or negative (0) sentiment with a relatively simple LSTM (Recurrent Neural Network).
    Modules: TensorFlow, Keras.

  • Medical Chatbot with NLTK: Ingest communication and responses to train an NLP model. Then implement a GUI to make inferences and interact with the catbot and get respones.
    Modules: NLTK, TensorFlow, Keras, Numpy, tkinter.

Data Analysis and Visualisation

  • Smart AirBnB booking in Berlin (dataset 2020-08-30): Analysis of the price variability in Berlin's AirBnB listings scraped in August 2020. A huge data set. Which district, which amenities, and what time of the year are best value?.
    Modules: Pandas, Matplotlib, Seaborn, Scikit-Learn.

  • Exploring US Economic Data with a Dashboard: This project intents to visualize simple time-series data in a dashboard and makes it permanently available in an S3 bucket.
    Modules: Bokeh.

  • Toronto Neighborhoods Analysis: The project explores the Wikipedia data on Toronto (Canada) neighbourhoods with the post code M and will create labelled, interactive maps.
    Modules: Pandas, BeautifulSoup, Folium.

  • Shopping Mall development in Charlotte, North Carolina, U.S.A.: City and Foursquare data on shopping malls is compared and knn-clustered by shopping-mall density per neighborhood to offer insight where new shopping malls may be a good fit.
    Modules: Pandas, BeautifulSoup, SciKit-Learn, Foursquare-API, ESRI geocoding, Folium.

  • Market Analysis for Tech Stocks: Ingest, visualize, evaluate risk, and Monte-Carlo simulate prices for some technology stocks: Apple, Google, Microsoft, Amazon.
    Modules: Pandas, Numpy, Matplotlib, Seaborn.

  • 911 Calls Exploration (dataset 2020-07-29): This exploration will analyze the emergency call (911) dataset from Kaggle containing Fire, Traffic, Emergency Medical Services (EMS) incidents for Montgomery County, Pennsylvania.
    Modules: Pandas, Matplotlib, Seaborn.

Computer Vision

About


Languages

Language:Lasso 59.0%Language:Jupyter Notebook 39.8%Language:HTML 1.2%Language:Python 0.0%