Update measurement data every night or so
marcoow opened this issue · comments
We should have a simple mix task that updates the data for all the locations we have in the database every night or so so that the data is always fresh (enough) and can be returned right away without loading it synchronously.
If we have a mix task for this we can run that with heroku's scheduler feature.
I'd just use a Genserver tbh. Like we did here: https://github.com/simplabs/issue-aggregator/blob/master/lib/issue_aggregator/auth/sync.ex
Just putting this here for future reference: https://devcenter.heroku.com/articles/scheduler
GenServer is way more complex though as the mix
task could simply look like this, reusing what we already have:
Location
|> Repo.all
|> Enum.each(fn location ->
Airquality.Sources.OpenAQ.get_latest(location.id)
end)
True, but we'd be depending on heroku to make the code run. What if we move to a different server?
IMHO for languages which don't have Genserver like features built in, it would make sense; but I think we should make use of Elixir's built in functionality (this is actually the type of thing the Erlang VM was designed for).
Also a Genserver isn't actually that complex. We'd need this:
def start_link() do
GenServer.start_link(__MODULE__, :ok, name: __MODULE__)
end
def init(:ok) do
Process.send_after(self(), :work, @interval)
{:ok, %{last_sent: nil}}
end
def handle_info(:work, _state) do
Location
|> Repo.all
|> Enum.each(fn location ->
Airquality.Sources.OpenAQ.get_latest(location.id)
end)
Process.send_after(self(), :work, @interval)
{:noreply, %{last_run_at: :calendar.local_time()}}
end
Also with Genserver I'm pretty sure we could effectively spawn a process for each location, making use of parallelism to speed up the process. (as long as OpenAQ doesn't rate limit us too much)
The Genserver would also be supervised - restarting processes in the event of failure would be a breeze.
This is, of course, just my 2 cents. If you'd prefer to use the heroku scheduler and a mix task, then let's do that 🙂
heroku scheduler is basically just a different name for cron
which is available everywhere.
Also with Genserver I'm pretty sure we could effectively spawn a process for each location, making use of parallelism to speed up the process.
This is something you'd usually try to avoid when interacting with external APIs as you might run into request limits.
To me using a Genserver here looks like using that only because it's available but not because it provides any real benefits in this case…