pixelated / pixelated-user-agent

User facing components of Pixelated: a JavaScript single page app and a RESTful service.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTTPError 401 - Unauthorized - After soledad merge

tuliocasagrande opened this issue · comments

spawned from #888
We are running soledad's master branch on staging, dev and unstable.

All environments were running reasonably well, then after 3:00 AM we started getting a leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized') every minute. This started on Mar 4th on dev and staging and on Mar 6th on unstable.

We also noticed there's an event that occurs every day at 3:00 AM:

Mar  4 03:00:01 pixelated CRON[29391]: (leap-webapp) CMD (cd /srv/leap/webapp && bundle exec rake cleanup:tokens)

The restart of pixelated-server recovers the state.

We found /var/spool/cron/crontabs/leap-webapp that sets the job:

0 3 * * * cd /srv/leap/webapp && bundle exec rake cleanup:tokens

On staging, we changed it to run every two minutes to be able to reproduce the error:

*/2 * * * * cd /srv/leap/webapp && bundle exec rake cleanup:tokens

Surprisingly, it does NOT corrupt the state every run (by "corrupt state" I mean having Unauthorized errors). In fact, only in one of the cleanup:tokens run, it started giving Unauthorized, which we solved by restarting the pixelated-server service again.

We've set dev and unstable to also run the job every 2 minutes to have more data.
It's still not clear why and when the Unauthorized state starts happening.

We've found a very similar issue (#905). Notice that, despite the exception being raised and polluting the channel, the rest of the process was closing/logging out any remaining user resource:

. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.bitmask.mail.incoming.service] INFO starting sync...
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.common.http._HTTP11ClientFactory] INFO Starting factory <leap.common.http._HTTP11ClientFactory instance at 0x7f922c159050>
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.soledad.client.api] ERROR got exception when syncing!
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: *--- Failure #13719 ---
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: Failure: leap.soledad.common.errors.InvalidAuthTokenError:
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: *--- End of Failure #13719 ---
C Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.bitmask.mail.incoming.service] WARN sync failed because token is invalid: <twisted.python.failure.Failure leap.soledad.common.errors.InvalidAuthTokenError: >
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [pixelated.application] INFO Invalid soledad token, logging out thais@unstable.pixelated-project.org
. Jan 10 03:00:59 unstable1 pixelated-user-agent[24342]: 2017-01-10 03:00:59 [leap.soledad.client.api] DEBUG closing soledad

With the newest soledad changes, the log is now showing a HTTPError(401, 'Unauthorized') instead of InvalidAuthTokenError. Moreover, the "logging out" part isn't being performed anymore and the fetch continues trying every minute:

Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [leap.bitmask.mail.incoming.service] INFO starting sync...
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [twisted.web.client._HTTP11ClientFactory] INFO Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7f6da8efca70>
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [leap.soledad.client.api] ERROR got exception when syncing!
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: *--- Failure #41888 ---
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: Failure: leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: *--- End of Failure #41888 ---
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: 2017-03-07 20:22:10 [leap.bitmask.mail.incoming.service] ERROR [Failure instance: Traceback (failure with no frames): <class 'leap.soledad.common.l2db.errors.HTTPError'>: HTTPError(401, 'Unauthorized')
Mar  7 20:22:10 unstable1 pixelated-user-agent[26568]: ]
...
Mar  7 20:23:10 unstable1 pixelated-user-agent[26568]: Failure: leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
...
Mar  7 20:24:13 unstable1 pixelated-user-agent[26568]: Failure: leap.soledad.common.l2db.errors.HTTPError: HTTPError(401, 'Unauthorized')
...

We've also found that bitmask mail checks specifically for the InvalidAuthTokenError when starting closing user resources:

incoming/service.py#L236

def _handle_invalid_auth_token_error(failure):
    failure.trap(InvalidAuthTokenError)
    logger.warn('sync failed because token is invalid: %r' % failure)
    self.stopService()
    emit_async(catalog.SOLEDAD_INVALID_AUTH_TOKEN, self._userid)

And soledad/client/http_target/api.py#L234 on the other hand seems the place where the InvalidAuthTokenError starts:

if failure.getErrorMessage() == "401 Unauthorized":
    raise InvalidAuthTokenError

We suspect this "401 Unauthorized" message might have changed recently and that's the reason we're not closing leftover resources anymore. @kalikaneko @shyba @drebs


Of course, this still doesn't solve #905 original problem.