HiSPARC / correlation

(deprecated) Data correlation analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The search_operational_stations function could be made faster

153957 opened this issue · comments

This function now tries to download one day (yesterday) of data for each station it thinks exists. To check if they uploaded data yesterday and thus determine if they are active. This (according to the function description) might take 20 minutes. One plus is that the user will have a day of data for every station, but that should not be the function of this function.

It might be faster to simply try to visit their page on data.hisparc.nl to see if it exists, that means that station uploaded some data that day (otherwise it returns a 404 page):
Data: http://data.hisparc.nl/django/show/stations/95/2012/5/13/
No data: http://data.hisparc.nl/django/show/stations/95/2012/5/1/

The problem with this method is that it does not differentiate between event data and weather data.
So for some stations a more thorough check might be required:
Only weather: http://data.hisparc.nl/django/show/stations/501/2012/5/15/

Another option might be to try to directly download the Source or histogram from data.hisparc.nl.
Data: http://data.hisparc.nl/media/histogram-eventtime-501-2012-05-01.png
No data: http://data.hisparc.nl/media/histogram-eventtime-501-2012-05-15.png

Fixed in c732942
It now uses the Source download instead of the png files, because we are moving from png graphs to javascript graphs, so those pngs might be be available.

It now takes about 3 minutes to do the complete search.
Further improvements can probably be made, it seems that the slowest part is waiting for unavailable pages (= no data) returning a does-not-exist code.