s-hub404 / Journey-to-Data-Science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Journey-to-Data-Science

Day 1 -- 3rd JAN 2020 : I am starting my journey with Github, and learning and mastering Data Science, This repo is more of a way to remind myself to continuously keep improving myself and track my learning progress. I hope if this small repo can help the budding Data Science enthusiasts in the future and I could be of any little help to them.

Thanks to Krish Naik's Complete Machine Learning playlist on youtube 'https://www.youtube.com/playlist?list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe'. It's been a few months since I have started to learn on my own with SQL, Excel and Python.

Starting with Python for Data Science, today I had finished all the Python basics. It's kind of like a revision.

Day 2 -- 4th JAN 2020 : Continued the commitment, progressed on to learn Python. Got an overview of Python's Data Structure - List, Dictionaries, Sets, and Tuples, following the video series of Krish Naik's mentioned above. The aim is to master the basics and get it started with the work. It was relatively easier to quickly grab the concepts, prior knowledge of programming helped. Python is easy to learn, and its huge Library adds to its effectiveness contributing towards learning Data Science.

Day 3 -- 5th JAN 2020 : Among its huge Library, learned about Numpy and Pandas. Numpy is to carry the numerical and scientific computations, it basically creates an array and supports various mathematical operations. Pandas is Python's Data Analytics Library - it helps to read the data from CSV, Excel, HTML, JSON, and stores it in Series or Dataframe Data Structures. It can clean, manipulate the data, perform analysis over it, and write the information to a file.

I had finished Numpy and Pandas in Dphi's modules. So I have got a basic understanding of working with these Libraries.

Day 4 -- 6th JAN 2020 : Continuing I learned about Python's Data Visualization libraries Matplotlib and Seaborn. Matplotlib enables us to visualize our data to graphs - Scatter plot, Bar graph, Histogram, Box plot, and Pie Chart. All have their significance. Bar Graph helps us to compare by visualizing the categorical Data, while Histogram shows the distribution of Data in a range grouped into bins.

Seaborn is the extension of Matplotlib, it helps us with the Statistical analysis and visualization of data. Knowledge of Statistics is crucial in Seaborn.

This repository is the motivation for me to not give up and continue the journey to Data Science. Following now I would be learning EDA (Exploratory Data Analysis) and Statistics. Maths and Statistics are crucial for Data Science. It is just the start.

Day 5 -- 7th JAN 2020 : The most important thing is to extract viable Data from our raw Data set. Preprocessing our data, finding the missing values, analyzing and filling the missing values accordingly is the basic step of EDA. It is the most important and time-consuming task before proceeding with modelling our Data for Machine Learning or deriving insights from it. EDA is all about practice. The more we expose ourselves to it, the better we become. Now before proceeding, the next thing we will be doing is working with some Data set.

Day 6 and 7 -- 9th JAN 2020 : Started with the Statistics playlist, got to know about the basics of Statistics - Population v/s Sample, Gaussian distribution and log Normal distribution, Measure of Central tendency, their significance and reasons to use them.

Today was the weekend so I got the time to get to work with Data. I have tried to do an EDA of the Titanic Data Set from Kaggle. Reading Data, getting basic details about the data, handling the missing values, and trying to fill the missing values. This is pre-processing our Raw Data.

Day 8 -- 10th JAN 2020 : Studied about Statistical concepts like Central limit theorem, Chebyshev's inequality, Covariance, it's modification Pearson CC and Spearman Rank CC, their properties.

22 March 2021 : I have given myself time to study and prepare the concepts, worked hard my ways to learn Python, Statsistics, Tableau. The next one month I will commit myself to commit the changes in Git and keep working on my projects. Yes, there is stil lot to learn but more so it is equally important to apply this knowledge into actions.

About


Languages

Language:Jupyter Notebook 100.0%