ramkumarpj / pandas-challenge

Pandas Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pandas Challenge

PyCity Schools Analysis

There are 2 data sets provided - school data and student data. The key elements provided in the school data are 'School Name', 'School Type' and 'Budget' of all the schools in a school district. The key elements provided in the student data are 'School Name', 'Grade', 'Reading Score' and 'Math Score' of each student.

Following analysis were performed after joining above 2 datasets on 'School Name' field

  1. District Summary - The overall stats of schools in the district is calculated.

  2. School Summary - The data is summarized here for each school.

  3. Highest Performing School Summary - Top 5 schools that had highest 'Overall Passing' percentage is shown here.

  4. Bottom Performing School Summary - 5 schools that had lowest 'Overall Passing' percentage is shown here.

  5. Math Scores By Grade - The average math score of 9th, 10th, 11th & 12th graders of each school is shown here.

  6. Reading Scores By Grade - The average reading score of 9th, 10th, 11th & 12th graders of each school is shown here.

  7. Scores by School Spending - The 'Spending Ranges (Per Student)' for each school is shown here.

  8. Scores by School Size - A new bucket to categorize the schools based on total number of students is added here. Here are the buckets used for this categorization - "Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"

  9. Scores by School Type - Here the overall stats are looked at by the 2 school types - Charter and District.

Conclusions:

  1. The overall passing percentage was higher for schools that spend least amount of money per student.

  2. The charter schools had higher overall passing percentage compared to the district schools.

  3. Large schools with 2000-5000 students had lowest overall passing percentage.

  4. Cabrera High School had the highest overall passing percentage (91.33%).

  5. Rodriguez High School had the lowest overall passing percentage (52.99%)

  6. The percentage of students who passed reading is higher compared to Math.

Files

  • Source Code: PyCitySchools/PyCitySchools_starter.ipynb
  • Dataset: PyCitySchools/Resources/students_complete.csv PyCitySchools/Resources/schools_complete.csv

Run Instructions

  • Open a terminal
  • Confirm condo version
    conda --version\
  • Confirm jupyter version
    jupyter --version\
  • Activate conda environment
    conda activate dev\
  • Launch Jupyter Notebook
    jupyter notebook\
  • Jupyter Notebook is opened in a browser
  • Open "PyCitySchools/PyCitySchools_starter.ipynb" file using Jupyter Notebook
  • Click on 'Cell > Run All' to run

Disclaimer

This repo was published for educational purpose only. Copyright 2023 edX Boot Camps LLC. All rights reserved.

About

Pandas Challenge


Languages

Language:Jupyter Notebook 100.0%