r-b-g-b / clean-water-tool

Reporting Tool to Support Safe Drinking Water in California’s Disadvantaged Communities

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create data visualization - exceedance of safe levels for analyte data (model shown)

ruckeralex opened this issue · comments

Here are 2 cool examples showing ways of visualizing the exceedance of levels for a given analyte for a public water system-- anyone up for using this as inspiration to create something like it using our current dataset?:

  1. along a spectrum, showing the water system's measured level (the "RESULT" field in our dataset), the state threshold (the "MCL" field in our dataset), and-- if possible-- the average result for the periods when the water system was in violation.

  2. visual summary of the quarters for the period of history (e.g., since 2012 or 2017, depending on the dataset) with black dot representing the quarter where the water system was in violation for the state threshold.

image

From EWG site:
https://www.ewg.org/tapwater/system.php?pws=CA5010019

@nayanab565

Screen Shot 2019-09-03 at 4 34 58 PM

We are doing this now: clicking on an assembly member's name brings up an overlay containing a map of their location with al the analytes present. Clicking on an analyte button adds it to the set of 3D bars on the map representing the % exceedance over MCL

I'd love to brainstorm more approaches to visualizing this data, we can choose the best one or include multiple if it makes sense

The above example doesn't address the history at all.

There are some interesting blips in the data for example there was one test in LA which came up as a violation which affected 4 million people. It was quickly corrected but it would be cool to create something that shows the magnitude of something like this even though it was short and is now corrected.

Here is one idea: A fullscreen map showing population affected by size of dot representing water system and we automatically animate through history so the dots grow, shrink, appear and disappear as time progresses. We could add the third dimension to show magnitude of exceedance here too.

Yes, I'd love to work on this issue!

  1. I'll start by calculate the average result that you've suggested @ruckeralex I'm still getting familiar with the data (I joined two weeks ago!) so I think computing a statistic like this would also help me understand the data better!
  2. @aaronhans The map idea is awesome! I don't have much map visualization experience, so I think I'll start by creating something like Rucker screenshotted. I think that model makes it easier to get a straight forward overview of the histories of all the water systems.

just noticed that the UOMs are not consistent across all the measurements (this is the set of values in both the RESULT_UOM and MCL_UOM columns: {'MFL', 'MG/L', 'PCI/L', 'UG/L', nan}. I talked about it with Michael and he said it would be best to calculate the percentages, so I think I'll proceed with that for now.

Edit: Er, don't think it would make sense to use percentages in graph #1 above. I don't actually think it's a problem that the UOMs are not consistent since each water system will have their own graph #1.

Hi! @ruckeralex I wanted to request your feedback on this... (and whoever else!! :))

I created an "along the spectrum" exceedance level graph like the one you screenshotted, and here's what it looks like (this is for one specific analyte, for one specific water system). As you can see, I just put a simple widget above the graph so that someone would be able to select the result that they'd want to compare to the analyte's threshold and the average result I computed for the whole time that that water system was out of compliance (for that analyte).

image

I could do this for all the analytes, for each water system. However, if someone wanted to compare different analytes with different units, it would be hard to compare two different analytes (just to provide an example - some analytes are like 0.1 UOM, and others are like 300 UOM.)

So, again I looked at one analyte for one water system, and I took each documented exceedance result (there were 7 for this analyte and water system), took the average of those results, and then graphed the percent change relative to the analyte threshold. The red bar represents the average result's percent change.

image
)

Note: The x-axis values are placeholders - I couldn't decide on whether to put the violation date (might be hard to read), or put the month of the violation date, or the quarter, etc.

I think the spectrum type graph is useful in helping folks understand a single analyte's exceedance relative to the average and the threshold, but that the bar graph is needed to get a clear picture across analytes of different units.

Hope I explained things clearly enough to get some feedback. Do you have any thoughts on what I did, and my decision to not visualize things on the spectrum but with a bar graph instead?

(btw I haven't pushed my code for the graph yet, it's just on my local Jupyter notebook. Can push it if that helps understand my comment.)

Nice work @kwonangela7!

I agree that percentages of exceedance above and below the threshold is useful for comparing results from different analytes. That would allow characterizing which analyte among several is most a problem.

@kwonangela7's bar chart is a good way to communicate more results in less space and less user interaction needed. And one step further, the average could be added as a line over the bars:
image
(the months shown are made up; these could be quarterly values, etc.)

And if the vertical range were kept consistent, we could even compare "sparkline-like" datagrams to each other:
image

Hi y'all - so sorry, I didn't get an email notification about this feedback and wasn't feeling well earlier this week so am getting back to everyone late! Thank you for the feedback!!

@mnorelli Thank you for the suggestions and examples :) I didn't think to visualize the running average... that's interesting! I'll give that a try.