Review options for source data files

ruckeralex opened this issue · comments

Our team -- you!-- have a choice about which data files to use to build the bi-monthly reporting tool. It might boil down to a question of ease of access to the data we need.

I'm attaching all 5 files here.

The 3 spreadsheets that are posted every 2 months to the public HR2W website (https://www.waterboards.ca.gov/water_issues/programs/hr2w/#data) are:

(1) - 2019-04-03_hr2w_web_data_active.xlsx [AGENCIES WITH AT LEAST 1 ACTIVE VIOLATION]
(2) - 2019-04-03_hr2w_web_data_rtc.xlsx [RETURNED TO COMPLIANCE - historical data]
(3) - 2019-04-03_hr2w_web_data_summary.xls [ABBREVIATED; CONTAINS SUMMARY OF COMPLIANCE STATUS]

The 2 spreadsheets not on the website are variations of the "web_data_active" sheet, plus one new column. If our team wishes the SWRCB agency can make these available via an FTP site every 2 months. They are:

(4) - Inventory_map_summary.xls [very similar to the "web_data_summary" posted to the website, except for one column (Default_message) is not found on the website. This column may be helpful check for validating our future summaries of historical data, since one of the values is: "NO VIOLATION OF THE PRIMARY DRINKING WATER STANDARDS HAVE BEEN RECORDED SINCE 1/1/2012"]
(5) - Inventory_map_details.xls [The Inventory_map_details spreadsheet is basically the "web_data_active" and "hr2w_web_data_rtc data" combined. However, it does not include static characteristic details about the water systems such as population served, # of service connections, etc.]


My instinct is to build the tool to use the publicly available data sets (web_data_active and web_data_rtc) and use the other sources to help validate that the reporting tool is interpreting data correctly.

Will use the Web Data Summary and Details sheets