HTTPError 401 - Unauthorized - After soledad merge
tuliocasagrande opened this issue · comments
spawned from #888
We are running soledad's master branch on staging, dev and unstable.
All environments were running reasonably well, then after 3:00 AM we started getting a leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
every minute. This started on Mar 4th on dev and staging and on Mar 6th on unstable.
We also noticed there's an event that occurs every day at 3:00 AM:
Mar 4 03:00:01 pixelated CRON[29391]: (leap-webapp) CMD (cd /srv/leap/webapp && bundle exec rake cleanup:tokens)
The restart of pixelated-server recovers the state.
We found /var/spool/cron/crontabs/leap-webapp
that sets the job:
0 3 * * * cd /srv/leap/webapp && bundle exec rake cleanup:tokens
On staging, we changed it to run every two minutes to be able to reproduce the error:
*/2 * * * * cd /srv/leap/webapp && bundle exec rake cleanup:tokens
Surprisingly, it does NOT corrupt the state every run (by "corrupt state" I mean having Unauthorized
errors). In fact, only in one of the cleanup:tokens
run, it started giving Unauthorized
, which we solved by restarting the pixelated-server service again.
We've set dev and unstable to also run the job every 2 minutes to have more data.
It's still not clear why and when the Unauthorized
state starts happening.
We've found a very similar issue (#905). Notice that, despite the exception being raised and polluting the channel, the rest of the process was closing/logging out any remaining user resource:
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.bitmask.mail.incoming.service] INFO starting sync...
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.common.http._HTTP11ClientFactory] INFO Starting factory <leap.common.http._HTTP11ClientFactory instance at 0x7f922c159050>
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.soledad.client.api] ERROR got exception when syncing!
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: *--- Failure #13719 ---
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: Failure: leap.soledad.common.errors.InvalidAuthTokenError:
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: *--- End of Failure #13719 ---
C Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.bitmask.mail.incoming.service] WARN sync failed because token is invalid: <twisted.python.failure.Failure leap.soledad.common.errors.InvalidAuthTokenError: >
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [pixelated.application] INFO Invalid soledad token, logging out thais@unstable.pixelated-project.org
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.soledad.client.api] DEBUG closing soledad
With the newest soledad changes, the log is now showing a HTTPError(401, 'Unauthorized')
instead of InvalidAuthTokenError
. Moreover, the "logging out" part isn't being performed anymore and the fetch continues trying every minute:
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [leap.bitmask.mail.incoming.service] INFO starting sync...
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [twisted.web.client._HTTP11ClientFactory] INFO Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7f6da8efca70>
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [leap.soledad.client.api] ERROR got exception when syncing!
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: *--- Failure #41888 ---
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: Failure: leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: *--- End of Failure #41888 ---
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [leap.bitmask.mail.incoming.service] ERROR [Failure instance: Traceback (failure with no frames): <class 'leap.soledad.common.l2db.errors.HTTPError'>: HTTPError(401, 'Unauthorized')
Mar 7 20:22:10 unstable1 pixelated-user-agent[26568]: ]
...
Mar 7 20:23:10 unstable1 pixelated-user-agent[26568]: Failure: leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
...
Mar 7 20:24:13 unstable1 pixelated-user-agent[26568]: Failure: leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
...
We've also found that bitmask mail checks specifically for the InvalidAuthTokenError
when starting closing user resources:
def _handle_invalid_auth_token_error(failure):
failure.trap(InvalidAuthTokenError)
logger.warn('sync failed because token is invalid: %r' % failure)
self.stopService()
emit_async(catalog.SOLEDAD_INVALID_AUTH_TOKEN, self._userid)
And soledad/client/http_target/api.py#L234 on the other hand seems the place where the InvalidAuthTokenError
starts:
if failure.getErrorMessage() == "401 Unauthorized":
raise InvalidAuthTokenError
We suspect this "401 Unauthorized"
message might have changed recently and that's the reason we're not closing leftover resources anymore. @kalikaneko @shyba @drebs
Of course, this still doesn't solve #905 original problem.
It was fixed on this MR https://0xacab.org/leap/soledad/merge_requests/70