classification-algorithms correlation-analysis data-science detection-model dimensionality-reduction encoding feature-engineering feature-scaling fraud-detection fraudulent-transactions label-encoding outliers pca smote standardscaler tsne-visualization non-fraudulent-transactions kears max-pooling nerual-network

Credit Card Fraud Detection

Focused on advancing credit card fraud detection, this project employs machine learning algorithms, including neural networks and decision trees, to enhance fraud prevention in the banking sector using Fraud Dataset. It serves as the final project for a Data Science course at the University of Ottawa in 2023.

Required libraries: scikit-learn, pandas, matplotlib.
Execute cells in a Jupyter Notebook environment.
The uploaded code has been executed and tested successfully within the Google Colab environment.

Binary-class classification problem

Task is to classify whether a credit card transaction is fraudulent or not based on various features, enhancing fraud detection in financial transactions: 1 /0

Independent Variables: include transaction details, credit card information, merchant data, billing address, cardholder demographics, occupation, and transaction timestamps.

Target variable:

'is_fraud': Binary variable indicating whether the transaction is fraudulent (1) or not (0).

Key Tasks Undertaken

Set-up

Loaded and displayed the dataset using Pandas.
Generated a profile report using the Pandas Profiling library.
Checked the dataset's basic information using info() and unique values using nunique().
Checked the Data Balance
Average values of different features for fraudulent and non-fraudulent transactions

Data Pre-processing

Missing and Duplicate Data: Checked for missing values and duplicate rows.
Feature Engineering: Calculated the age of credit card holders based on transaction and birth dates and isualized the age distribution in fraudulent and non-fraudulent transactions.
Feature Selection: Dropped unnecessary columns ans calculated correlation matrix and visualized it using a heatmap.
Dealing with Outliers.
Encoding Categorical Variables:using Label Encoder after identifing categorical and numerical features.
Feature Scaling:using Standard Scaler.
Dealing with Imbalanced Data: Dealing with Imbalanced Data.
Dimensionality Reduction and Data Visualization: using PCA & t-SNE for dimensionality reduction and visualization.(Not Applied in Modeling phase)
merge_from_ofoct

Data Modeling -A diverse set of classifiers, including SVM, Random Forest, Naive Bayes variants, KNN, XGBoost, SGD, Logistic Regression, Decision Tree, AdaBoost, and CatBoost, are employed for predict fraud .
Evaluation
- Using cross-validation, confusion matrices for each classifier for training and testing.
- Calculated accuracy, precision, recall, and F1 score for each classifier for training and testing.
- Compare the result with /without using PCA-Dimensionality Reduction.
  - Applying PCA-Dimensionality Reduction.
  - Without applying PCA-Dimensionality Reduction. (Complete the work Without applying PCA)
Champion Model: XGB Extreme X Gradient Boosting

   Cross_validation Accuracy for XGB Extreme X Gradient Boosting :[0.99937487 0.99874974 0.99916649 0.99749948 0.99854136 0.99833299 0.99812461 0.99874974 0.99895812 0.99874974]

Supervised Deep Learning Algorithms:

Neural Network (NN) Model
- Data Splitting: Split the training data into training and validation sets.
- Model Architecture: Created a Sequential neural network model with three hidden layers and an output layer.
- Compilation: Compiled the model using the RMSprop optimizer and binary crossentropy loss function.
- Training: Trained the NN model for 100 epochs on the training data.
- Evaluation and Results
  - Predictions: Obtained predictions on the validation set.
  - Classification Report: Generated a classification report, showing precision, recall, and F1-score.
  - Precision Analysis: Examined precision, which turned out to be 1, indicating a strong positive predictive value.
  - Model Evaluation: Evaluated the model on the test set, reshaping predictions and calculating accuracy.
Convolutional Neural Network (CNN) Model
- Data Reshaping: Reshaped the data into 3D as CNN requires 3D input.
- CNN Architecture: Developed a CNN model with convolutional and pooling layers, aiming to capture spatial features.
- Model Compilation: Compiled the CNN model using the Adam optimizer and binary crossentropy loss.
- Training: Trained the CNN model for 45 epochs on the training data.
- Evaluation and Visualization
  - Learning Curve: Plotted the learning curve to visualize the training and validation accuracy/loss over epochs.
  - Max Pooling Enhancement: Modified the CNN architecture by introducing max pooling layers to improve efficiency.
  - Final Evaluation: Evaluated the final CNN model on the test set and visualized the confusion matrix.
  - Classification Report:** Displayed a comprehensive classification report with F1-score for each class.

About

Focused on advancing credit card fraud detection, this project employs machine learning algorithms, including neural networks and decision trees, to enhance fraud prevention in the banking sector. It serves as the final project for a Data Science course at the University of Ottawa in 2023.

classification-algorithms correlation-analysis data-science detection-model dimensionality-reduction encoding feature-engineering feature-scaling fraud-detection fraudulent-transactions label-encoding outliers pca smote standardscaler tsne-visualization non-fraudulent-transactions kears max-pooling nerual-network

MIT License

Languages

Language:Jupyter Notebook 100.0%