Checking your math...

Question

Checking your math...

bwbug opened this issue 2 years ago · comments

In The Similar-Words Problem in your Readme, you wrote:

If we assumed a hypothetical 18,000 word list that was just 9,000 words and their plurals, I think the odds of getting at least one "awkward double" in a 4-word passphrase is (1/18000) * (2/18000) * (3/18000), which is a really small number. But check my math!

Although your conclusion is correct ("the odds...is a really small number"), the odds of this happening is over 600 million times more probable than what you have estimated.

The correct probability is 1/9000 + 2/9000 + 3/9000 - 11/9000**2 + 6/9000**3.

To prove this for a word list containing N words and their plurals (2 N words total), if P₁ is the probability of getting at least one "awkward double", and if P₀ is the probability of getting no awkward doubles, then

P₁ =1 - P₀

The probability if getting no awkward doubles (P₀) is the number of passphrases containing only unique stems (i.e., once a word has been selected, it cannot be reselected itself, and neither can its conjugate -- the plural or singular form, whichever was not picked in the previous selection), divided by the total number of possible passphrases. For a passphrase consisting of k words, the total number of passphrases is

N_total = (2 N)^k

To compute the number of passphrases containing only unique stems, the size of the word pool is reduced by 2 each time a word is selected (because the word itself is eliminated from further consideration, as is the plural/singular form of that word):

N_unique = (2 N)(2 N - 2)(2 N - 4) ... (2 N - 2 k +2)

Therefore, the probability of getting only unique stems is

P₀ = N_unique/N_total = (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)

Therefore, the general solution for the probability of getting at least one "awkward double" is

P₁ = 1 - (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)

For k=4, the math works out to the following result:

P₁ = 6/N - 11/N² + 6/N³

If, in the general solution, one neglects higher-order terms (N^-2, N^-3, etc.), the following approximate solution is obtained:

P₁ ≈ 1/N + 2/N + ... + (k - 1)/N = (k(k - 1))/(2 N)

Sam Schlinkert · Answer 1 · Sun Aug 07 2022 04:55:22 GMT+0800 (China Standard Time)

I can't quite follow this all the way through right now, but I trust your work! README has been updated. Thank you so much!

bwbug · Answer 2 · Sun Aug 07 2022 05:10:10 GMT+0800 (China Standard Time)

Thanks! Let me know if you want me to try to clarify any particular steps of the derivation -- I did skip over some algebra steps in a few of the equations above.