This repository contains an implementation of machine learning algorithms in C++ using the MNIST Handwriting Dataset. The implemented algorithms include K-Nearest Neighbors (KNN), K-Means, and a basic Neural Network. Additionally, the Neural Network has been tested on the Iris Dataset.
- Introduction
- Project Structure
- Dependencies
- MNIST Handwriting Dataset
- Iris Dataset
- Usage
- Contributing
- License
This project demonstrates the implementation of fundamental machine learning algorithms in C++. The algorithms implemented include K-Nearest Neighbors (KNN), K-Means, and a simple Neural Network. The main goal is to showcase the usage of these algorithms on real-world datasets such as the MNIST Handwriting Dataset and the Iris Dataset.
The project is organized as follows:
-
KNN/
: K-Nearest Neighbors implementation.src/
: Contains the source code (src/knn.cpp
).include/
: Header files for the KNN module (include/knn.hpp
).Makefile
: Makefile for compiling the KNN module.
-
KMEANS/
: K-Means implementation.src/
: Contains the source code (src/kmeans.cpp
).include/
: Header files for the K-Means module (include/kmeans.hpp
).Makefile
: Makefile for compiling the K-Means module.
-
NEURAL_NETWORK/
: Neural Network implementation.src/
: Contains the source code (src/layer.cpp
,src/network.cpp
,src/neuron.cpp
).include/
: Header files for the Neural Network module (include/layer.hpp
,include/network.hpp
,include/neuron.hpp
).Makefile
: Makefile for compiling the Neural Network module.
-
Dataset/
: Contains the MNIST Handwriting Dataset and Iris Dataset. -
include/
: Common header files (include/common.hpp
,include/data.hpp
,include/data_handler.hpp
). -
lib/
: Contains the shared library (lib/libdata.so
) used by multiple modules. -
obj/
: Object files generated during compilation. -
src/
: Common source files (src/common.cpp
,src/data.cpp
,src/data_handler.cpp
). -
test
: Executable file to test the data handling. -
Makefile
: Top-level Makefile for compiling the entire project. -
model.sh
: Script for creating new modules.
The project has the following dependencies:
- C++ Compiler (C++11 or later)
Make sure to install Eigen before running the code.
The MNIST Handwriting Dataset is used in training and testing of the algorithms. You can find the dataset in the Dataset/
directory. The ./src/data_handler.cpp
file provides a simple utility to read and load the dataset.
The Iris Dataset is used specifically for testing the Neural Network implementation. You can find the dataset in the Dataset/
directory.
To get started, clone this repository to your local machine. Open a terminal and run the following commands:
git clone https://github.com/Sinister-00/Machine_Learning.git
Navigate to the project directory.
cd Machine_Learning
In order to run the make files of each module, you need to export the path of the working directory.
Run this command on termiinal:
export MLINCPP_ROOT=$PWD
To check if the data handling, follow these steps:
- Inside
./src
opendata_handler.cpp
. - At the very bottom you have to uncomment the driver code.
- Save the file.
- Then inside terminal you have to run
make
- New file named
test
will be created. - Run
./test
on terminal
To test the implemented K-Means algorithm, follow these steps:
- Navigate to the K-Means directory:
cd KMEANS/
. - Compile the code using the provided Makefile:
make
. - Run the compiled executable
main
within the KMEANS directory.
Warning
This iterates through all k values, set the maximum value; otherwise, k equals the dataset's number of data points, treating each point as a cluster.
- Navigate to the K-Means directory:
cd KMEANS/
. - Comment out the traditional method and uncomment the part for using the WCSS method inside
src/kmeans.cpp
. - Save the file.
- Recompile the code using the Makefile:
make
. - Run the compiled executable
main
within the KMEANS directory.
Tip
This uses elbow method to choose the best value of K.
To test the implemented KNN algorithm, follow these steps:
- Navigate to the KNN directory:
cd KNN/
. - Compile the code using the provided Makefile:
make
. - Run the compiled executable
main
within the KMEANS directory.
Note
This will use euclidean distance by default.
- Navigate to the KNN directory:
cd KNN/
. - Goto
Makefile
there you can find a lineCFLAGS := -std=c++11 -DEUCLID
change that toCFLAGS := -std=c++11 -DMANHATTAN
and save file. - Compile the code using the provided Makefile:
make
. - Run the compiled executable
main
within the KMEANS directory.
To test the implementation of Neural Network, follow these steps:
- Navigate to NEURAL_NETWORK directory:
cd NEURAL_NETWORK
. - Compile the code using the provided Makefile:
make
. - Run the compiled executable
main
within the NEURAL_NETWORK directory.
Note
This will use iris dataset by default.
- Navigate to NEURAL_NETWORK directory:
cd NEURAL_NETWORK
. - Open
Makefile
insideNEURAL_NETWORK
. - There you can find a line
CFLAGS := -std=c++11 -g
. - Change that to
CFLAGS := -std=c++11 -g -DMLINX
. - Save the file.
- Compile the updated Makefile using
make
command in terminal. - Run the compiled executable
main
within the NEURAL_NETWORK directory.
Tip
You can choose the number of epochs required while training. Inside ./src/network.cpp
at the very bottom update net->train(17);
where 17
is the number of epoch.
Your contributions are valued! If you encounter any issues or have suggestions for new features, please report them. Before submitting pull requests, let's discuss the changes to ensure they align with the project goals.
If you find any issues, please create a detailed issue with a clear description of the problem and, if possible, steps to reproduce it.
If you have a new feature or improvement to propose, feel free to open a pull request. Ensure that your code follows the project's coding standards and practices.