KevinAquino / DCshotspot

Some noodling about with the DC ShotSpotter data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DCshotspot

Some quick results from a little noodling about with the DC ShotSpotter data. Commands used to perform said noodling are found in DCCalendar.R.

A rather fruitful and interesting discussion of the data can be found on reddit.

Data preparation

39065 events were recorded by the ShotSpotter system from Jan. 27, 2006 to Jun. 24, 2013. The data was already nicely prepared in an Excel spreadsheet with columns for the DC Ward, timestamp, incident type (single or multiple gunshots), and coordinate information (accurate to 100 m) for each event. Data was loaded into R.

Cuts were made for days that may be contributing potentially spurious data attributable to false positives (fireworks and celebratory gunfire on Dec. 31 - Jan. 1 and Jul. 3 - Jul. 5 ).

The remaining 27930 events should represent, fairly accurately, gunshots detected in the wards that were monitored by the ShotSpotter. Due to the targeted ShotSpotter coverage, I settled in on exploring variables that are, essentially, independent of location: incidents as a function of month and time of day.

The data was sorted into twelve data frames corresponding to the month that each event took place in. Histograms binned by time of day (quarter-hour bins) were prepared for each of the months.

Data presentation

The data was arranged in a calendar-style form using Andy Teucher's modified version of multiplot.R.

Conclusions

The month to month comparisons are slightly biased due to the start and end points of the data set (~2/06 to ~7/13), but I do not believe the effect to impact the qualitative conclusion much. It is quite clear that June and July are particularly active months. This matches well with anecdotal accounts in popular media.

Also quickly apparent is the repeatable shape of the histograms. It seems very likely that this is attributable to the fact that it is a particularly poor idea to open fire in broad daylight. I have shaded the bars corresponding to the average daylight period for each month (as gathered from any number of daylight calculator websites), which does seem to account for the monthly variation (i.e. shifting and widening/shrinking) in the period of time in which gunshots are most frequent.

For grins, I went through the same analysis for Oakland, CA, and the same general patterns emerge:

Future work

I'm curious to know if the "broad daylight" hypothesis could be extended to well-lit areas. The location of street lights is publicly available but I will need to think a bit about the best way to set up this problem. This would be another problem that should be largely immune to the limited ShotSpotter coverage.

This is just spitballing at this point, but if there is a clear correlation between shot locations and absence of street lights, it would be fairly straightforward to implement a machine learning algorithm to find optimal places to introduce additional lights.

About

Some noodling about with the DC ShotSpotter data.


Languages

Language:R 100.0%