Hack4Eugene / SpeedUpAmerica

Crowd-sourced internet speed tests using M-Lab data and user tests on a website, with charts, maps, and raw data downloads.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Report on options for faster calculation of medians (1 SP)

ryanrolds opened this issue · comments

Is your feature request related to a problem? Please describe.
We currently calculate, store, and show median speeds for support boundaries and ISPs. Medians are calculated as part of one, or two, Rake taskes (update_stats_cache and possibly update_providers_statistics).

The problem is it requires downloading all upload and download speeds from the DB (over 6M records) and calculating the median. The calculation is a simple but memory and IO hungry process. The largest factor in the time our nightly data imports are these calculations.

Describe the solution you'd like
This issue is not to solve the problem. It's to author a one-page document on a few ways we can improve/solve our problem. Please look at ways to do the calculation inside of MySQL 5.7. Research if switching to a new version of MySQL or another DB/datastore (PostgreSQL, TimescaleDB, Prometheus, etc...) will make it possible to solve the problem.

The deliverable for this issue is a one-pager outlining at least 3 options with their pros and cons.