20Harsha / Internship_Tasks

This repository consist of my completed tasks while working as an intern at The Sparks Foundation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Sparks Foundation Internship tasks

Task 1: Prediction using Supervised Machine Learning

  1. Predict the percentage of an student based on the no. of study hours.This is Simple Linear Regression task as it involve only 2 variables.
  2. Data can be found at http://bit.ly/w-data
  3. You can use R, Python, SAS Enterprise Miner or any other tool.
  4. What will be predicted score if a student studies for 9.25 hrs/ day?

Results:

  • The dataset consist of students study hours and their percentage score

  • There are 25 rows and 2 columns.

  • Student's Percentage score range from 17 - 95, where 17 is the minimum score,95 being the maximum score and median score is 47.

  • Student's study hours range from 1.1 hrs - 9.2 hrs where 1.1hrs is the minimum, 9.2 is the maximum and median number of hours students studied is 4.8.

  • Correlation between Hours vs Scores is 0.9761. Hence their is a positive linear relationship between Hours studied and Students percentage score.

  • Problem Statement: Predict score of a student when student studies for 9.5 hrs/day.

    Using simple linear regression to predict student's percentage score. Linear regression model has predicted the percentage score as 96.169 when student studies for 9.5 hrs/day.

  • The Simple Linear Regression model performs well as the R-Squared value is 0.945.

  • Youtube link : https://youtu.be/vakaYX7j9Ts

  • LinkedIn Post : https://www.linkedin.com/posts/harshakumavat2000_task1-gripjan22-gripjanuary22-activity-6884138891713101824-Cyp1?utm_source=share&utm_medium=member_desktop

Task 2: Prediction using Unsupervised Machine Learning

  1. Predict the optimum number of clusters from the iris dataset and represent it visually.
  2. You can use R or Python to perform this task.

Results:

  • The dataset consist of Id ,Sepal length, Sepal width, Petal length ,Petal width and Species.

  • There are 150 rows and 6 columns.

  • Problem Statement : Predict the optimum number of clusters from the iris dataset and represent it visually.

Capture

Task 3: Exploratory Data Analysis - Retail

  1. Perform 'Exploratory Data Analysis' on dataset 'SampleSuperstore'.
  2. As a business manager, try to find out the weak areas where you can work to make more profit.
  3. What all business problems you can derive by exploring the data?
  4. Data can be found at https://bit.ly/3i4rbWl
  5. You can use (Python/R/Tableau/PowerBI/Excel/SAS/SAP)

Results:

Problem Statement : Find out weak areas where you can work to make profit and what all business problem can be derived by exploring data.

  • Standard Class in ShipMode has recorded the highest profit and Same Day has recorded the lowest profit.
  • There are 3 segments selling products they are Consumer, Corporate & Home Office where Consumer segment has recorded maximum profit followed by Corporate whereas Home Offices recorded minimum profit.
  • In United States the products are sold where West region has recorded maximum profit followed by East and lowest being recorded in Central region.
  • Top 5 most sold products Sub-Category wise are Phones, Chairs, Storage, Tables & Binders.
  • Top 5 least sold products Sub-Category wise are Fasteners, Labels, Envelopes, Art & Supplies.
  • When the discount given on a product is beyond 20% then company is getting a loss instead of gainning profit.
  • Maximum profit is gained by Copiers, Phones, Accessories ,Paper, Binders whereas Tables has recorded maximim loss followed by Bookcases & Supplies.Hence discount given on these products can be reduced to increase profit.
  • Maximum Sales are from states California, New York & Minimum sales are from North Dakota, West Virginia.
  • State California & New Yok has recorded the maximum profit whereas Texas, Ohio, Pennsylvania in these states products has occured loss. So discount given in these states can be reduced to increase profit.
  • As maximum sales are in states California, NewYork so sales can be increased in these areas to gain profit and In technology category company is getting benefitted so increase in sales of these category can increase profit.
  • Youtube link : https://youtu.be/KGFVlLMektQ![image](https://user-images.githubusercontent.com/87359806/149663331-94d33b4e-5bb7-4cc1-bbaf-289ef9403935.png)
  • LinkedIn Post : https://www.linkedin.com/posts/harshakumavat2000_task3-gripjan22-gripjanuary22-activity-6888484236651839488-QwYq?utm_source=share&utm_medium=member_desktop

About

This repository consist of my completed tasks while working as an intern at The Sparks Foundation


Languages

Language:Jupyter Notebook 100.0%