-
Explore and discover the compensation details and high-level trends from historical datasets of San Francisco Controller Employees Total Compensation (TC) during past 4 years.
-
Provide visualized insights to help job-seekers learn more about how does SF Controller's Office pay to their employees.
web_demo.mov
-
ETL Jupyter Notebook: sf_comp_jupyter_analytics.ipynb
-
ETL python script:[data_clean.py]
-
Application script: sf_comp_api_data.py
-
deploy to Web(TBC)
Since time and scope limitation, this project focused on the job market and information from controller office since 2018, and the dashboard currently focus on aggregate information. Besides challegence from data, the large-scale data means huge page_load time when I deployed. Next steps I plan to:
- Introduce more datasets, cross and merge to draw more useful insights to job-seeker:
- for example, comparing to other cities controller compensation in the same period,
- or compare to non-gov job market)
- Increase page load speed, will add cache or dcc.store to temparay store and share data
- Add pretty style
- Deploy to heroku
The San Francisco Controller's Office maintains a database to record the salary and benefits paid to City employees since fiscal year 2013. SF Controller's Office Employee Salary Data origin_data_website
The analysis process and data insights are in the ETL Jupyter Notebook sf_comp_jupyter_analytics.ipynb
This Analytics only considers 4 calendar years from 2018 to 2021. Rows reduction from 759K to 168K,
- (758604,22) ===> (168437, 22)
Categorical Features: 'organization_group_code', 'job_family_code', 'job_code', 'year_type', 'year', 'organization_group', 'department_code', 'department', 'union_code', 'union', 'job_family', 'job', 'employee_identifier',
Numerical Features: 'salaries', 'overtime', 'other_salaries', 'total_salary', 'retirement', 'health_and_dental', 'other_benefits', 'total_benefits', 'total_compensation'
Unique sample size in each categorical columns:
- Organization_group(6)
- department(51)
- job_family(56)
- job(1111)
Top 5 popular Job:
- Transit Operator 10805
- Special Nurse 6364
- Registered Nurse 5960
- Custodian 3438
- Firefighter 3077
-
API: no username or password needed, query in API step to reduce the dataset size.
-
Interactive Aggregate graph Clients are able to choose the rank top 5 or bottom 5 to display.
-
Next steps: tore cleaned data to save load time
- 'Dirty' data clean ETL
- Deployed plotly Dashboard
- geoJSON Javascript project
- Deployed Map URL:
- Traverse and retrieve GeoJSON data to populate an interactive geographical map about earthquakes and tectonic plates using JavaScript,leaflet.js libraries as well as Mapbox API.
- another plotly Javascript project