travis-ci / worker

Worker runs your Travis CI jobs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Report and/or gracefully handle rate limiter connection issues

soulshake opened this issue · comments

Worker depends on an external Redis for rate-limiting purposes. Since this Redis runs on Heroku, its endpoint can sometimes change.

If it cannot connect to this Redis, worker fails to start, so no jobs can run.

We should:

  • make worker report this issue (and add an alert)
  • make worker handle this failure more gracefully

Open question: How should worker handle a situation where it cannot connect to its Redis? Should it:

  • fail to start (as currently)
  • start, but disregard rate limits completely
  • start, but in some different fallback mode where it makes fewer requests
  • something else?

Alternatively, it would not hurt to reevaluate worker's behavior towards the GCE API. Is there any way we can get rid of the dependency on an external rate-limit-checker completely?

References

commented

Alternatively, it would not hurt to reevaluate worker's behavior towards the GCE API. Is there any way we can get rid of the dependency on an external rate-limit-checker completely?

I am strongly in favour of getting rid of it.

According to the Quotas page we are well within current quota limits. According to our redis rate limiting metrics we are still heavily rate limiting ourselves every now and then, but mostly are not applying any rate limits.

I would be in favour of turning it and applying for quota increases if we hit quota limits on their end.