Dutta-SD / SPARKS

This repository is for storing code related to internship at The SPARKS Foundation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The SPARKS Foundation DSBA Repository(September, 2020 - October, 2020)

This repo is for storing code and documents of the user Sandip Dutta related to internship at The SPARKS Foundation. During the internship, the following tasks were performed:

Data Science Mentor (October - December, 2020)

  • Cleared doubts pertaining to tasks by fellow interns
  • Received Letter of Recommendation for exceptional work

LOR_SPARKS_SANDIP_DUTTA

Data Science Intern (September 2020)

  • Business Analytics (TASK - 4) - In this final task, we were to analyse a business data.

    • Data was analysed using pandas.
    • EDA was performed using Seaborn and Matplotlib.
    • Applied Hypothesis tests like chi2_contingency and kendalltau using scipy.stats.
    • Fitted a RidgeRegression model from sklearn
    • The final accuracy came to about 0.995.
  • Decision Tree (TASK - 3) - In this task we were to explore Decision Tree Algorithm using sklearn on IRIS dataset.

    • Splitted the data into train and validation part.
    • Fitted a DecisionTreeClassifier on the dataset.
    • For visulaising it, we used matplotlib.
    • Then we plotted decision surfaces for two features and checked the accuracy of the model.
    • Decision tree gave a good f1-score(near to 1.00).
  • Iris_Unsupervised (TASK - 2) - This folder is for the iris data analysis using KMeans and DBSCAN algorithm.

    • First plots were generated and features visualised.
    • Then DBSCAN was applied and we got the optimum number of clusters as 3.
    • We shifted to K Means(after scaling the data).
    • We determined the ideal number of clusters using elbow method and it too came out to be 3.
    • Lastly, we plotted a confusion matrix to see the classification.
  • Student_data (TASK - 1) - This folder contains data for some students.

    • Task is to predict whether score increases if number of hours of study increases.
    • We performed EDA and fitted a linear Regression model for this data.
    • The accuracy came to be about 95 % based on r2_score metric.

    Internship Completion Certificate at SPARKS Foundation

About

This repository is for storing code related to internship at The SPARKS Foundation


Languages

Language:Jupyter Notebook 100.0%