kaggle-dataset data-analysis data-visualization pandas seaborn keras-tf librosa pytorch decision-trees xg self-supervised-learning variational-autoencoder exploratory-data-analysis transformer unsupervised-learning timeseries lstm-neural-networks cnn-pytorch knn-classification stock-price-prediction

Volkan Sonmez's Machine Learning Projects

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.

Each repository will (usually) correspond to one of the posts on my website.

You are free to:

Share—copy and redistribute the material in any medium or format
Adapt—remix, transform, and build upon the material

Under the following terms:

Attribution—You must give appropriate credit (mentioning that your work is derived from work that is © Volkan Sonmez and, where practical, linking to http://www.pythonicfool.com/), and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. License

List of Exploratory Data Analysis and Machine Learning Projects:

1- Anomaly Detection - EDA & ML

This is a timeseries dataset showing hourly temperature values for one year. Kmeans++ is written from scracth for clustering the data and ADTK is used for anomaly detection. The dataset can be obtained at: https://www.kaggle.com/boltzmannbrain/nab

2- Audio Dataset - EDA & ML

Several laughters in .wav format are analyzed with Librosa and Matplotlib Libraries. Convolutional NN are used to make predictions. The dataset can be found in the 'laugh' and 'laugh_test' folders. There are 22 laughter files in total. Some sound sincere and some sound fake. The gray scale mel spectogram images of the laughter audio files are trained and tested.

3- Breast Cancer Dataset - EDA & ML

Breast Cancer Dataset is analyzed with Pandas, Seaborn, and Matplot Libraries. Decision Tree & XGBoost models are trained to make a prediction with 95% and 97% accuracy respectively. The dataset can be obtained at: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29

4- Deep Graph Library (DGL) - ML

DGL tutorials are simplified with examples. Deep Graph Library is a great tool to do node classification, edge classification, and graph classification. It has its own tutorial datasets. This notebook has detailed analysis of CoraDataSet and MiniGCDatasets with dgl.nn module. https://www.dgl.ai/

5- Framingham Dataset - EDA & ML

Framingham Dataset is analyzed with Pandas, Seaborn, and Matplot Libraries. KNN, Logistic Regression Classifier, and a One Layer Neural Network are applied to the dataset. Raw framingham.csv dataset is downloaded from Kaggle.

6- MNIST Dataset - EDA & ML

Famous MNIST Dataset is analyzed with Pandas, Seaborn, and Matplot Libraries. Pytorch and TF-Keras libraries are used to build models with FCL and CNNs. The dataset can be downloaded from: https://www.kaggle.com/oddrationale/mnist-in-csv , tf.keras.datasets.mnist, or torchvision.datasets.MNIST

7- Time Series Stock Dataset - EDA & ML

Stock prices is analyzed with Pandas, Seaborn, and Matplot Libraries. FBProphet, ARIMA, and LSTM models (with Keras TF) are used to make predictions. The dataset can be obtained at: https://finance.yahoo.com/chart/AAPL/

8- Transformer - ML

A Transformer Encoder is coded from scratch with PyTorch and then trained for performing a sentiment analysis on the torch.datasets.IMDB dataset.

9- Truck Backer Upper - EDA & ML

A truck is learning how to park backwards, creating its own training data with emulator and doing its steering with controller. This notebook is the enhanced version of the copy at the NYU 2020 Deep Learning Class. The trained weights are stored in the emulator.txt and controller.txt files.

10- VAE with Yale Database - ML

Variational Auto Encoder (VAE) is created and trained it with the Yale Face Database to extract the average facial features of the dataset. This dataset can be found here: https://www.kaggle.com/kerneler/starter-yale-face-database-c5f3978b-5

11- Bitcoin Price Analysis and Prediction - EDA & ML

Bitcoin price is analyzed with Pandas and Matplot Libraries. ARIMA (statistical) and LSTM (machine learning) models are used to make predictions. The dataset can be obtained with yfinance module.

About

Several datasets are manipulated, visualized, and analyzed with well-known ML Algorithms to make predictions, clustering, or classifications.

kaggle-dataset data-analysis data-visualization pandas seaborn keras-tf librosa pytorch decision-trees xg self-supervised-learning variational-autoencoder exploratory-data-analysis transformer unsupervised-learning timeseries lstm-neural-networks cnn-pytorch knn-classification stock-price-prediction

MIT License

Languages

Language:Jupyter Notebook 100.0%