Introduction to Basic Operations in Data Science
In this task, you will be given a dataset (Titanic Survivers) and your job is to analyse the data given using relevant functions in Pandas, Numpy and Matplotlib.
Tasks to Perform on this Dataset
A few things that I want you to do is -
- Download the dataset and load it using Pandas.
- Remove all the NULL values from the dataset.
- Remove 'Name' and 'PassengerID' column
- Divide the dataset in 80:20 ratio using .loc and .iloc ONLY
- Plot a histogram for 'Fare','Age' column
- Plot bar chart for all binary columns (like 'Survived')
Explore more commands and features on your own, hopefully this will give you a good start!
References
- Dataset - https://www.kaggle.com/c/titanic/
- Pandas, Numpy cheatsheet - http://www.cheat-sheets.org/saved-copy/NumPy_SciPy_Pandas_Quandl_Cheat_Sheet.pdf
- Pandas Tutorial - https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python
- Data Analysis with Python - https://www.youtube.com/watch?v=r-uOLxNrNk8