banknote-authentication decision-tree decision-tree-classifier dt machine-learning matplotlib-pyplot models numpy pandas plotting sklearn

Banknote Authentication Decision Tree

This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.

Dataset

The code uses the "BankNote_Authentication.csv" dataset, which contains four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.

Requirements

The following libraries are imported in the code:

sklearn.tree: Provides the decision tree classifier.
pandas: Used for data manipulation and analysis.
sklearn.model_selection.train_test_split: Splits the data into training and testing sets.
numpy: Handles mathematical operations and array manipulation.
matplotlib.pyplot: Enables data visualization.

Functions

`measureAccuracy(y_pred, y_test)`

Calculates the accuracy of the predicted labels (y_pred) compared to the actual labels (y_test). Returns the accuracy as a floating-point value.

`Experiment_Utility(X, Y, splitRatio)`

Performs an experiment with a specific train-test split ratio (splitRatio) using the decision tree algorithm. Splits the data into training and testing sets, fits the decision tree model, and predicts the labels for the testing set. Returns the accuracy and the number of nodes in the decision tree.

`GetStats(array)`

Calculates the mean, maximum, and minimum values of an input array. Returns the statistics as a NumPy array.

`Experiment(X, Y, splitRatio)`

Performs multiple experiments with a fixed train-test split ratio (splitRatio). Reruns the experiment five times with different random splits of the data. Returns the accuracies and tree sizes for each experiment.

`plotting(y_axis, fileName)`

Plots the y-axis values against the training set size. Saves the plot as an image file with the specified fileName.

`main()`

The main function reads the dataset, separates the features (X) and the labels (Y), and initializes matrices for accuracy and tree size statistics. It then runs two sets of experiments:

Experiment 1: Fixed train-test split ratio

The function runs the experiment with a 75% training ratio, recording the accuracies and tree sizes for each iteration.
The size of each iteration is displayed in the following table:

Set Size	Accuracy
25.0	0.9620991253644315
31.0	0.9630709426627794
39.0	0.956268221574344
27.0	0.967930029154519
31.0	0.9689018464528668

Experiment 2: Range of train-test split ratios

The function iterates over a range of training set sizes (30% to 70%) and performs the experiment five times with different random seeds.
For each training set size, it calculates the mean, maximum, and minimum accuracy and tree size for all iterations.
The accuracy and tree size for each iteration are displayed in the following tables:

Accuracy for each iteration

Iteration	Mean	Max	Min
30%	0.96774	0.97815	0.95421
40%	0.97282	0.97937	0.96723
50%	0.97376	0.98834	0.96064
60%	0.98069	0.98361	0.96903
70%	0.97961	0.99029	0.9733

Size for each iteration

Iteration	Mean	Max	Min
30%	31.8	37.0	25.0
40%	37.4	41.0	35.0
50%	35.8	45.0	27.0
60%	41.0	47.0	35.0
70%	47.0	51.0	41.0

Usage

To run the code, follow these steps:

Install the required libraries: sklearn, pandas, numpy, and matplotlib.pyplot.
Download the "BankNote_Authentication.csv" dataset and place it in the same directory as the code file.
Run the code. The main function will execute the experiments and generate the accuracy and tree size results.
The code will also generate plots showing the accuracy and tree size against the training set size.

Conclusion

In conclusion, this Python code provides a practical implementation of banknote authentication using a decision tree algorithm. It allows for experimentation with different train-test split ratios and training set sizes, providing insights into how these factors affect the accuracy and size of the decision tree model.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.

Team

License

This program is licensed under the MIT License.

About

This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.

banknote-authentication decision-tree decision-tree-classifier dt machine-learning matplotlib-pyplot models numpy pandas plotting sklearn

MIT License

Languages

Language:Python 100.0%