This project is designed to load and clean life expectancy data from a provided CSV file. The main purpose of the script is to process the data, convert it into a more usable format, and save the cleaned data as a CSV file. The structure of this project is as it follows:
life_expectancy
├── data
├── tests
├── cleaning.py
├── pyproject.toml
└── README.md
The cleaning.py script focuses on the data for a specific region.
Life expectancy data often comes in messy formats, and it's important to preprocess and clean the data before performing any analysis. This script provides a solution for loading, cleaning, and saving life expectancy data for a specific region. It handles tasks like handling missing values, converting data types, and reshaping the data for easier analysis.
To use the script, follow these steps:
- Make sure you have the necessary requirements installed (see Requirements).
- Download the raw life expectancy data file (
eu_life_expectancy_raw.tsv
) and place it in thedata
directory. - Open a terminal or command prompt.
- Navigate to the project directory.
- Run the script using the following command, replacing
REGION_NAME
with the desired region's name (e.g., "PT" for Portugal):
python cleaning.py --region REGION_NAME
The script performs the following tasks:
-
Loading Data: The script loads the raw life expectancy data from a provided CSV file (
eu_life_expectancy_raw.tsv
). -
Data Cleaning: It preprocesses the data by splitting a combined column and cleaning column names.
-
NaN Handling and Conversion: The script identifies NaN-like values in specified columns and converts them to floats. It also removes any rows with missing data.
-
Data Reshaping: The script melts the DataFrame to turn all year columns into a single year column for easier analysis.
-
Region Selection: It filters the data to retain only the rows corresponding to the specified region.
-
Saving Data: The cleaned and reshaped data is saved as
pt_life_expectancy.csv
in thedata
directory.
- Python 3.x
- pandas
- numpy
- Clone or download this repository.
git clone https://github.com/sofiapessoaamorim/life_expectancy.git