tristar82 / lab-advanced-data-cleaning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ironhack logo

Lab | Data Cleaning

Introduction

We keep seeing a common phrase that 80% of the work of a data scientist is data cleaning. We have no idea whether this number is accurate but a data scientist indeed spends lots of time and effort in collecting, cleaning and preparing the data for analysis. This is because datasets are usually messy and complex in nature. It is a very important ability for a data scientist to refine and restructure datasets into a usable state in order to proceed to the data analysis stage.

Challenge

Try to tidy the weather data included in Ironhack's database (db: weather, table: temperatures). This dataset is a subset of a global historical climatology network dataset. The data represents the daily weather records for a weather station (MX17004) in Mexico for five months in 2010. The goal of this additional challenge is to get the most tidy dataset you are able to produce.

Hint: variables are stored in both rows and columns.

To accomplish this challenge, you will need to do some research on tidying and melt&pivot. Feel free to reference any resources you consider appropiate.

About


Languages

Language:Jupyter Notebook 100.0%