SF_Employee_Comp_dash

Objectives:

Explore and discover the compensation details and high-level trends from historical datasets of San Francisco Controller Employees Total Compensation (TC) during past 4 years.
Provide visualized insights to help job-seekers learn more about how does SF Controller's Office pay to their employees.

Screenshot

Process and Output:

ETL Jupyter Notebook: sf_comp_jupyter_analytics.ipynb
ETL python script:[data_clean.py]
Application script: sf_comp_api_data.py
deploy to Web(TBC)

Limitation and Next Steps:

Since time and scope limitation, this project focused on the job market and information from controller office since 2018, and the dashboard currently focus on aggregate information. Besides challegence from data, the large-scale data means huge page_load time when I deployed. Next steps I plan to:

Introduce more datasets, cross and merge to draw more useful insights to job-seeker:
- for example, comparing to other cities controller compensation in the same period,
- or compare to non-gov job market)
Increase page load speed, will add cache or dcc.store to temparay store and share data
Add pretty style
Deploy to heroku

Data Resources:

The San Francisco Controller's Office maintains a database to record the salary and benefits paid to City employees since fiscal year 2013. SF Controller's Office Employee Salary Data origin_data_website

ANALYTICS Part

The analysis process and data insights are in the ETL Jupyter Notebook sf_comp_jupyter_analytics.ipynb

Scope and Assumption

This Analytics only considers 4 calendar years from 2018 to 2021. Rows reduction from 759K to 168K,

(758604,22) ===> (168437, 22)

columns info:

Categorical Features: 'organization_group_code', 'job_family_code', 'job_code', 'year_type', 'year', 'organization_group', 'department_code', 'department', 'union_code', 'union', 'job_family', 'job', 'employee_identifier',

Numerical Features: 'salaries', 'overtime', 'other_salaries', 'total_salary', 'retirement', 'health_and_dental', 'other_benefits', 'total_benefits', 'total_compensation'

Additional Details

Unique sample size in each categorical columns:

Organization_group(6)
department(51)
job_family(56)
job(1111)

Top 5 popular Job:

Transit Operator 10805
Special Nurse 6364
Registered Nurse 5960
Custodian 3438
Firefighter 3077

Dashboard Part Highlights

API: no username or password needed, query in API step to reduce the dataset size.
Interactive Aggregate graph Clients are able to choose the rank top 5 or bottom 5 to display.
Next steps: tore cleaned data to save load time

My other dashboard projects

'Dirty' data clean ETL
Deployed plotly Dashboard
geoJSON Javascript project
Deployed Map URL:
- Traverse and retrieve GeoJSON data to populate an interactive geographical map about earthquakes and tectonic plates using JavaScript,leaflet.js libraries as well as Mapbox API.
another plotly Javascript project

susiexia / SF_Employee_Comp_Dash