Nour-Sadek / Data-Analysis-for-Hospitals

This project provided practice with using the pandas and pyplot libraries

Home Page:https://hyperskill.org/projects/152?track=28

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Analysis-for-Hospitals

About

You know the story. Data is everywhere: texts, images, news, and spreadsheets. It affects our habits and defines our future. The amount of data is growing by the second. How can one stay afloat in this great sea of data? Data analysis is required in any line of business. In this project, you will conduct a comprehensive study with pandas. You will upload datasets, deal with data omissions and incorrect data filling, find the main statistical characteristics, and visualize your data.

Learning Outcomes of the Project:

Conduct a comprehensive data study using the pandas library: from uploading data and correcting errors in the CSV files to simple data visualization.

Learning Outcomes of Each Stage of the Project:

Stage 1 : Load data from CSV files to the program.

Stage 2 : Make a single dataset from several CSV files.

Stage 3 : Improve the dataset which may be inconsistent and contain errors.

Stage 4 : Use pandas statistics tools to gain insights from data.

Stage 5 : Use pandas visualization tools to present the data succinctly.

General Info

To learn more about this project, please visit HyperSkill Website - Data Analysis for Hospitals.

This project's difficulty has been labelled as Hard where this is how HyperSkill describes each of its four available difficulty levels:

  • Easy Projects - if you're just starting
  • Medium Projects - to build upon the basics
  • Hard Projects - to practice all the basic concepts and learn new ones
  • Challenging Projects - to perfect your knowledge with challenging tasks

This Repository contains one .py file and one folder:

code.py - Contains the code used to complete the data analysis requirements

Data repository - Contains the three .csv files that contain the data: general.csv, prenatal.csv, and sports.csv

Project was built using python version 3.11.3

Description of Data Sets

All three datasets contain the following 15 columns:

  • Unnamed: 0 - Contains the indexes of the tables
  • hospital
  • gender
  • age
  • height
  • weight
  • bmi
  • diagnosis - Includes values such as 'pregnancy', 'cold', 'dislocation', etc
  • blood_test
  • ecg
  • ultrasound
  • mri
  • xray
  • children
  • months

How to Run

Download the files to your local repository and open the project in your choice IDE and run the project. The different data frames and answers to the questions will be printed on the console as well as the required plots for visualization according to the requirements stated in each stage's docstring. Please read each Stage's docstring to know the requirements.

About

This project provided practice with using the pandas and pyplot libraries

https://hyperskill.org/projects/152?track=28


Languages

Language:Python 100.0%