KhaledAshrafH / DT-Banknote-Authenticator

This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Banknote Authentication Decision Tree

This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.

Dataset

The code uses the "BankNote_Authentication.csv" dataset, which contains four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.

Requirements

The following libraries are imported in the code:

  • sklearn.tree: Provides the decision tree classifier.
  • pandas: Used for data manipulation and analysis.
  • sklearn.model_selection.train_test_split: Splits the data into training and testing sets.
  • numpy: Handles mathematical operations and array manipulation.
  • matplotlib.pyplot: Enables data visualization.

Functions

measureAccuracy(y_pred, y_test)

Calculates the accuracy of the predicted labels (y_pred) compared to the actual labels (y_test). Returns the accuracy as a floating-point value.

Experiment_Utility(X, Y, splitRatio)

Performs an experiment with a specific train-test split ratio (splitRatio) using the decision tree algorithm. Splits the data into training and testing sets, fits the decision tree model, and predicts the labels for the testing set. Returns the accuracy and the number of nodes in the decision tree.

GetStats(array)

Calculates the mean, maximum, and minimum values of an input array. Returns the statistics as a NumPy array.

Experiment(X, Y, splitRatio)

Performs multiple experiments with a fixed train-test split ratio (splitRatio). Reruns the experiment five times with different random splits of the data. Returns the accuracies and tree sizes for each experiment.

plotting(y_axis, fileName)

Plots the y-axis values against the training set size. Saves the plot as an image file with the specified fileName.

main()

The main function reads the dataset, separates the features (X) and the labels (Y), and initializes matrices for accuracy and tree size statistics. It then runs two sets of experiments:

Experiment 1: Fixed train-test split ratio

  • The function runs the experiment with a 75% training ratio, recording the accuracies and tree sizes for each iteration.
  • The size of each iteration is displayed in the following table:

Set Size

Accuracy

25.0

0.9620991253644315

31.0

0.9630709426627794

39.0

0.956268221574344

27.0

0.967930029154519

31.0

0.9689018464528668


Experiment 2: Range of train-test split ratios

  • The function iterates over a range of training set sizes (30% to 70%) and performs the experiment five times with different random seeds.
  • For each training set size, it calculates the mean, maximum, and minimum accuracy and tree size for all iterations.
  • The accuracy and tree size for each iteration are displayed in the following tables:

Accuracy for each iteration

Iteration

Mean

Max

Min

30%

0.96774

0.97815

0.95421

40%

0.97282

0.97937

0.96723

50%

0.97376

0.98834

0.96064

60%

0.98069

0.98361

0.96903

70%

0.97961

0.99029

0.9733

Size for each iteration

Iteration

Mean

Max

Min

30%

31.8

37.0

25.0

40%

37.4

41.0

35.0

50%

35.8

45.0

27.0

60%

41.0

47.0

35.0

70%

47.0

51.0

41.0


Usage

To run the code, follow these steps:

  1. Install the required libraries: sklearn, pandas, numpy, and matplotlib.pyplot.
  2. Download the "BankNote_Authentication.csv" dataset and place it in the same directory as the code file.
  3. Run the code. The main function will execute the experiments and generate the accuracy and tree size results.
  4. The code will also generate plots showing the accuracy and tree size against the training set size.

Conclusion

In conclusion, this Python code provides a practical implementation of banknote authentication using a decision tree algorithm. It allows for experimentation with different train-test split ratios and training set sizes, providing insights into how these factors affect the accuracy and size of the decision tree model.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.

Team

License

This program is licensed under the MIT License.

About

This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.

License:MIT License


Languages

Language:Python 100.0%