sufianadnan / Linear-Regression-Lab

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Linear Regression Lab Overview

This repository contains code and analysis completed for the Linear Regression lab, focusing on predicting data breach sizes using Python. Details and additional information for each task can be found in the Jupyter Notebook file provided.

Tasks Completed

Task 1: Data Understanding

Objective: Exploratory data analysis (EDA) on the dataset to understand the nature of the data.

  • Actions Taken:
    • Read and visualized the dataset using various plots (bar graphs, histograms, scatter plots).
    • Described findings based on the visualizations.
    • Prepared data for analysis by converting nominal values to integers.
    • Split the dataset into training and testing samples (70-30 ratio).

Task 2: Simple Linear Regression

Objective: Develop a simple linear regression model to predict data breach size.

  • Actions Taken:
    • Selected a relevant column for predicting the breach size.
    • Created a simple linear regression model.
    • Calculated coefficients of determination, root mean square error, intercept, and slope.
    • Predicted values for training and test datasets.
    • Plotted actual vs. predicted values.
    • Analyzed model performance and compared accuracy based on different columns.

Task 3: Multiple Linear Regression

Objective: Create multiple linear regression models using selected columns for prediction.

  • Actions Taken:
    • Chose two and then three important columns for predicting the breach size.
    • Developed multiple linear regression models.
    • Evaluated model accuracy by calculating coefficients of determination, root mean square error, intercept, and slope.
    • Predicted values for training and test datasets.
    • Plotted actual vs. predicted values.
    • Compared different degrees of polynomial regression to identify the optimal order based on accuracy metrics.

Task 4: Overfitting versus Underfitting

Objective: Analyze model performance concerning overfitting and underfitting.

  • Actions Taken:
    • Adjusted the size of the training dataset and assessed model performance.
    • Evaluated models using coefficients of determination and root mean square error.
    • Discussed the preferable metric between determination coefficients and RMSE for accuracy assessment.

Repository Structure

The repository includes a Jupyter Notebook file containing detailed code, analysis, and conclusions for each task. Refer to the notebook for a comprehensive overview and implementation details.

About


Languages

Language:Jupyter Notebook 100.0%