Develop better plotting tools

Question

Develop better plotting tools

jcsmithhere opened this issue 2 years ago · comments

Jeff Smith commented 2 years ago

Some things to plot:

Better scatter plot on globe using different projections, be able to take cuts on instrument, lat/lon and time.
Better histogram of bolides vs time, where you can select start and end time
Plot individual light curves, both G16 and G17 on same figure.

We can add to this list, but the above if a good first list.

Anthony Ozerov · Answer 1 · Tue Jun 14 2022 08:20:03 GMT+0800 (China Standard Time)

Scatter plot

Implemented this with the plot_detections method. Features:

Pass in any map projection from Cartopy (in practice, a CoordinateReferenceSystem object) into the crs argument.
Color points by a categorical variable by passing a column name into the categorical argument. A legend is automatically added. An example call to color by confidence rating is bdf.plot_detections(category='confidenceRating')
Color points by a quantitative variable by passing a column of the BolideDataFrame into the c argument. This is the same as matplotlib's syntax, but plot_detections will also add a colorbar. An example call to color by solar hour is bdf.plot_detections(c=bdf['solarhour'])
Plot GOES-East and GOES-West GLM field of views using the argument boundary=['goes-e','goes-w']. I am not too happy with this implementation as it relies on pickled tuples of Shapely Polygons and Cartopy CoordinateReferenceSystems obtained from the goes2go package.
Can be easily combined with the filtering syntax to filter the BolideDataFrame first, then plot it.
Pass in any keyword arguments that matplotlib's scatter method takes. For instance, the points' size can be varied by bolide duration (bdf.plot_detections(s=bdf['duration'])). An advanced user might use marker shape to represent another categorical variable.
Pass in any matplotlib style to the style argument.

Examples (after running bdf=BolideDataFrame() and where ccrs is cartopy.crs):
sc-1.png: bdf.plot_detections(crs=ccrs.Geostationary(central_longitude=-75.2), category='detectedBy')

sc-2.png: bdf.plot_detections(crs=ccrs.AlbersEqualArea(central_longitude=-100), c=bdf['solarhour'])

Histogram

Implemented this with the plot_dates method. Features:

Change the range by passing in dates into the start and end arguments (just one of them, or both!). An example call would be plot_dates(start='2019-05-01', end='2020-01-01'). These strings can be in any form accepted by datetime.datetime.fromisoformat.
Pass a string like "2M" into the freq argument to change the binning to 1 bin per 2 months. A string like "7D" would also work.
Bar width automatically changes with the range and frequency.
Can be easily combined with the filtering syntax to filter the BolideDataFrame first, then make the histogram.
Pass in any keyword arguments that matplotlib's scatter method takes.
Pass in any matplotlib style to the style argument.
I am still thinking about how to display categorical data nicely in the histogram…

Example:
hist.png: bdf.plot_dates(freq='2D', start='2020-01-01', end='2022-01-01')

Light curves

Implemented using the lightkurve package. The BolideDataFrame has a lightcurves column which holds LightCurveCollection objects. These have a plot method which produces nice plots with both light curves on the same figure.

Example:
lc.png:

bdf.filter_date(start='2022-05-15', inplace=True)
bdf.add_website_data()
bdf.lightcurves[4].plot()

Jeff Smith · Answer 2 · Wed Jun 15 2022 07:34:15 GMT+0800 (China Standard Time)

Hi @anthonyozerov ! This looks great. Some comments:

Great initiative to add in the GLM boundaries with the goes2go package. Some detections clearly fall outside the boundaries on the figure. I suspect this is due to parallax. The geos2go boundaries are either at cloud level (~10 km) or at ground level. The bolides occur at much higher altitudes (Up to 100 km) but are reported by GLM at cloud level. It would be good to confirm the boundaries given by goes2go are accurate and determine what altitude they are reporting their boundaries for. It would be an interesting exercise to to confirm if we re-navigate the bolides to the correct altitude that they all lay within the boundaries. We should also mention this in the tutorial.
What is the function add_website_data() doing? Is this downloading extra data from the website, which is what it sounds like it is doing. Do you not download all data at once when constructing bdf=BolideDataFrame() for speed?

Anthony Ozerov · Answer 3 · Wed Jun 15 2022 08:00:16 GMT+0800 (China Standard Time)

I also think that some of this is parallax, but I just did some more digging into the goes2go package and found an interesting notebook here by the author. It looks like they contacted some GLM people who have the actual boundaries, and it is a little larger in the corners than the author's estimate based on the data book. It might be worth it to contact the people and obtain the true boundaries + altitude used for the boundaries. Having the true boundaries would make estimates of bolide distribution over latitudes a little better.
add_website_data() pulls the light curve data (and other data too, but it only uses the light curve data now) from the website for every single bolide event in the BolideDataFrame (the JSON data which the BolideDataFrame is built from does not include light curves, unless I missed something). Problem is, to get the individual light curve data from the website API it's 1 request per bolide. So I figured it should only be done after filtering to reduce the number of bolides.

Jeff Smith · Answer 4 · Wed Jun 15 2022 12:54:03 GMT+0800 (China Standard Time)

The original code did pull all the light curve data. See the original gist I wrote, where I plot light curves. However, that doesn't mean what you do is wrong. It could be what you do is a better way to do it.

Anthony Ozerov · Answer 5 · Wed Jun 15 2022 14:20:07 GMT+0800 (China Standard Time)

I think in BolideList every time we access a bolide (ultimately using __getitem__) it instantiates it, which in the __init__ of Bolide looks like it pulls data from the website. So if I understand correctly the original code still pulls the light curve data only later, when a specific bolide is accessed (i.e. 1 request per bolide).

That does actually give me an idea, though, cause maybe we could make the lightcurve column of the BolideDataFrame work in a similar way, automatically pulling the data when a specific light curve is accessed. This would remove the need to call an add_website_data method and would also be pretty cool. I think this could be done with a wrapper class around lightkurve's LightCurveCollection that holds no data when initialized but automatically pulls it the first time one of the methods (e.g. plot) is called. The other website data that is 1 request per bolide could be handled in a similar way (currently it is just not included). I will look into it…

Anthony Ozerov · Answer 6 · Sat Jun 25 2022 08:26:55 GMT+0800 (China Standard Time)

Implemented density plots in 3ab98ee in bdf-implementation with the plot_density method. Here are a couple examples of usage:

import bolides.crs as bcrs
filtered = bdf[(bdf.detectedBy=='G17') & (bdf.confidence>0.7)]
fig, ax = filtered.plot_density(bandwidth=5, style='dark_background', crs=bcrs.GOES_W(), boundary=['goes-w'], figsize=(15,8))
ax.gridlines(color='black')
plt.title('GLM-17 bolide detection density')

Note the new custom coordinate references system class GOES_W. This makes it really easy to make plots from a satellite's perspective. Also note how the boundary argument adds a boundary and clips the data by it. Analogously for G16:

filtered = bdf[(bdf.detectedBy=='G16') & (bdf.confidence>0.7)]
fig, ax = filtered.plot_density(bandwidth=5, style='dark_background', crs=bcrs.GOES_E(), boundary=['goes-e'], figsize=(15,8))
ax.gridlines(color='black')
plt.title('GLM-16 bolide detection density')

We can also do this with any projection Cartopy supports:

filtered = bdf[bdf.confidence>0.7]
fig, ax = filtered.plot_density(bandwidth=5, style='dark_background', crs=ccrs.Robinson(), boundary='goes', figsize=(16,8))
ax.gridlines(color='black')
plt.title('Pipeline bolide detection density')

The density is calculated in the spherical coordinate system, then gridded (according to a resolution configurable with lat_resolution and lon_resolution), masked by the boundary, and projected onto a given projection. The gridding is why the edges are somewhat jagged. I don't know how to use the boundary as a mask on pixels in the final map but maybe there is a way.

Jeff Smith · Answer 7 · Thu Aug 04 2022 14:17:35 GMT+0800 (China Standard Time)

Was not able to change the point size or style for the plot_detections method. It looks like the plt.scatter kwargs are not being passed correctly, or the parameters need to be in a format that is not obvious.

Anthony Ozerov · Answer 8 · Tue Aug 09 2022 02:05:07 GMT+0800 (China Standard Time)

Fixed the point size problem in da01f68. The s kwarg wasn't being passed through correctly.