vaidehii203 / PRODIGY_DS_03

Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data. Use a dataset such as the Bank Marketing dataset from the UCI Machine Learning Repository.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PRODIGY_DS_03

Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data. Use a dataset such as the Bank Marketing dataset from the UCI Machine Learning Repository. Additionally data visualiztion using PowerBI is done to check model's performance.

Dataset link:https://archive.ics.uci.edu/dataset/222/bank+marketing

Table of Contents

  1. Files
  2. Requirements
  3. Setup Instructions
  4. Usage
  5. Jupyter Notebook Code
  6. Contact

1. Files

  • decision_tree_classifier.ipynb: Jupyter Notebook containing the code for data preprocessing, model training, evaluation, and result visualization.
  • predictions.csv: CSV file containing the actual and predicted values for the test dataset.
  • results.csv: CSV file containing the evaluation metrics of the model.

2. Requirements

  • Python 3.x
  • pandas
  • scikit-learn
  • Jupyter Notebook

3. Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/vaidehii203/PRODIGY_DS_03.git
    
  2. Install Dependencies:

    pip install pandas scikit-learn

4. Usage

Data Loading and Preprocessing
  • The notebook loads the Bank Marketing dataset (bank-additional-full.csv for training and bank-additional.csv for testing) using pandas.
  • Categorical variables in the dataset are encoded using LabelEncoder.
Model Training and Evaluation
  • The Decision Tree Classifier from scikit-learn is used to build the model.
  • Evaluation metrics such as accuracy, precision, recall, F1 score, and confusion matrix are computed using scikit-learn's metrics functions.
Saving Results
  • Predictions and evaluation metrics are saved to CSV files (predictions.csv and results.csv).
Printing Results
  • The notebook prints the evaluation metrics and displays the predictions DataFrame to the console for verification.

5. Jupyter Notebook Code

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

# Load the dataset
train_data = pd.read_csv("D:\\intership tasks\\bank-additional\\bank-additional-full.csv", delimiter=';')
test_data = pd.read_csv("D:\\intership tasks\\bank-additional\\bank-additional.csv", delimiter=';')

# Display the first few rows of the dataset
train_data.head()

# Encode categorical variables in the training dataset
label_encoders = {}
for column in train_data.select_dtypes(include=['object']).columns:
    label_encoders[column] = LabelEncoder()
    train_data[column] = label_encoders[column].fit_transform(train_data[column])

# Encode categorical variables in the testing dataset using the same encoders
for column in test_data.select_dtypes(include=['object']).columns:
    if column in label_encoders:
        test_data[column] = label_encoders[column].transform(test_data[column])

# Train-test split
x_train = train_data.drop(columns=['y'])
y_train = train_data['y']

x_test = test_data.drop(columns=['y'])
y_test = test_data['y']

# Build and train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(x_train, y_train)

# Predict
y_pred = clf.predict(x_test)

# Evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, pos_label=1, average='binary')
recall = recall_score(y_test, y_pred, pos_label=1, average='binary')
f1 = f1_score(y_test, y_pred, pos_label=1, average='binary')
conf_matrix = confusion_matrix(y_test, y_pred)

# Save predictions and evaluation metrics to CSV
predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
predictions_df.to_csv('predictions.csv', index=False)

results = pd.DataFrame([{
    'accuracy': accuracy,
    'precision': precision,
    'recall': recall,
    'f1_score': f1
}])
results.to_csv('results.csv', index=False)

# Print evaluation metrics
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
print("Confusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nPredictions DataFrame:")
print(predictions_df)

6. Contact

For more information or to get in touch, please visit my LinkedIn profile.https://www.linkedin.com/in/vaidehi-kale-b635b7264/


About

Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data. Use a dataset such as the Bank Marketing dataset from the UCI Machine Learning Repository.


Languages

Language:Jupyter Notebook 100.0%