Problem with validating recovery token

Question

Problem with validating recovery token

jonathansimmons opened this issue 9 years ago · comments

Problem

It's very easy to invalidate recover tokens

Summary

The otp_recover_counter column is being incrementing on ever recover token attempt regardless of whether or not the token was accepted. The only token that will be accepted is the token that matches the current `otp_recovery_counter

Detail

Right now in master the it validate_otp_recovery_token method looks like this:
/lib/devise_otp_authenticatable/models/otp_authenticatable.rb

def validate_otp_recovery_token(token)
  recovery_otp.verify(token, otp_recovery_counter).tap do
    self.otp_recovery_counter += 1
    save!
  end
end

The problem with this is the block is being run even if the token is not valid. Which means the otp_recover_counter is being incremented even though a valid token was not provided. Which is an issue because it means the valid recover token becomes a moving target. Let me explain with scenarios.

Scenarios

A perfect world
1.) I successfully activate otp
2.) I save my 10 tokens and sign out.
3.) After losing my phone attempt to use token 0
4.) I get signed in fine, so I remove that token from my token list.
5.) The otp_recovery_counter is now on 1

Notes: If I continue this process always using the first token in my list my otp_token_counter will always match the token I attempt to use and everything will be fine.

Real world
1.) I successfully activate otp
2.) I save my 10 tokens and sign out.
3.) After losing my phone attempt to use token, and randomly grab token 4
4.) When I submit token 4 it valid to validate because my otp_recover_counter is 0. meaning only token 0 will work.
5.) Even though the validation attempt failed the otp_recovery_counter increments from 0 to 1 (it failed because I used token 4 but it would also fail if I typoed token 0)
6.) So I think weird that one didn't work, so I go try another. Catch is the only token that will work of my nine remaining options is token 1 because my otp_recovery_counter is currently 1. Problem is, I have a 1-9 change of picking the right token. If I chose anything but token 1 I'm right back at step 5

I've resolved the counter issue by adjusting the code like so:

def validate_otp_recovery_token(token)
  if recovery_otp.verify(token, otp_recovery_counter)
    self.otp_recovery_counter += 1
    save!
  else
    false
  end
end

While my fix above prevents the counter from being incorrectly incremented on an invalid token. I think we need to revise the way the recovery tokens are displayed altogether given the knowledge that only the recover code corresponding to the current counter will actually work. There just no point in ever displaying them more than one code.

Suggested Fix:

Display only next most relevant token after enabling otp, instead of a list.
Consider what the workflow should be after using a recovery token.
- Should we disabled otp, forcing them re-setup and therefore get a new token? They have lost their phone afterall.

@wmlele Let me know your thoughts.

wmlele · Answer 1 · Wed Jul 29 2015 04:09:40 GMT+0800 (China Standard Time)

The reason the counter is incremented is that it naturally limits the number of attempts someone who has gained your username/password can try to unlock your recovery codes.

Yes, he can still deny-of-service you by trying all the codes, but yet, still much better than letting them brute force all the codes (which are NOT a huge space)

We could have a different approach only if there was a different mechanism preventing any user from brute forcing the codes and locking them out after just a small number of attempt (which isn't much different from just letting him try all the codes from a small list)

Note that, the sequence number of the required token is printed out (well, it kind of depends on your templates, but it is available), both in the table and in the form requesting the recovery, so there should be no ambiguities regarding which token is requested at any given time. You are asked specifically for the recovery token number 4 from a table where the right code is labeled as 4.

Lastly, I did not feel like the recovery codes were just for recovery in the case I have lost the phone. Maybe I could just print out the list and keep it in my wallet, so I can log in the day I have left my phone at home, or the battery dies out.

Your workflow is doable of course, it's just a different approach to recovery as an extreme measure, and you still need to limit the chance of a brute-force.

Hope I have clarified what the rationale was behind the recovery codes. Thoughts very welcome.

Jonathan Simmons · Answer 2 · Wed Jul 29 2015 06:09:09 GMT+0800 (China Standard Time)

You're clarification helps, I'm not sure what I think. At a minimum I think the gem needs to clarify in the readme how the recover tokens work, specifically that they must be used sequentially. Even I, having spend a lot of time in this gem, totally missed that the tokens needed to be used sequentially.

Even still, I'm left wanting a better solution. I've not come across a TOTP service in the wild that required I pay attention to the sequence of the tokens. Again I'm modeling my implementations after some of the bigger services out there that use TOTP (Github, Dropbox, etc).

I agree recovery tokens are not just for the lost my device scenario but man there has to be a better way than requiring the user pay attention to sequence, or potentially get placed in a denial-of-service situation.

I'm gonna think on this one for a bit, and get back to you.

wmlele · Answer 3 · Wed Jul 29 2015 06:39:26 GMT+0800 (China Standard Time)

I think it was modeled on the Google implementation: back in the time neither Github nor Dropbox had 2-factors. I guess Google still has one-time tokens you can print and use, but I wouldn't bet they still work this way (I have switched to U2F long ago). My online bank also used a similar scheme several years ago, before they switched to physical OTP keyfobs.

I think it would be fine to accept any of the generated HOTP tokens (it doesn't make the system terribly less secure, although sensibly so), but then, you still need to lock out the account after a small [*] number of tries, which doesn't solve the DOS scenario.

The only way around it would be to design another layer of recovery, which doesn't rely on a small, finite pool, like sending the next time-based code with an SMS.

It needs, of course, a new model attribute to count failed attempts.

[*] Note that it would be wise to lock the account anyways after a reasonable number of failed attempts. Say rack-ratelimit or similar.

wmlele · Answer 4 · Wed Jul 29 2015 06:59:49 GMT+0800 (China Standard Time)

After all I don't think it was too cumbersome to pay attention to the sequence, except for the fact that now other services have started implementing it in a different way and people are used to it.

It also feels kind of 'natural' when you generate another set of HOTP tokens. Now you have two sheets of paper one with codes numbered 1-10 and another numbered 11-20. So it's obvious which one is the right code when the site prompts you for number 14.

Jonathan Simmons · Answer 5 · Fri Dec 04 2015 11:24:50 GMT+0800 (China Standard Time)

Closing this as I don't think my needs fit @wmlele vision. I'll continue to just manually adjust it for my use.

If I ever find time I might open a pr to make my flow and then allow the user to choose as a config.