Just a couple of questions..

Question

Just a couple of questions..

QRCS-CORP opened this issue 8 years ago · comments

Hi,
I have written an implementation of Scrypt and added it to a C++ library I am writing:
https://github.com/Steppenwolfe65/CEX
Some of the highlights; uses 128/256 simd for xor's, the smix function is (optionally) multi-threaded, has sequential fallbacks, and simd and multi-threading capabilities are run-time checked/enabled automatically.
One problem I had was getting the salsa8 simd function to align with the sequential fallback version. I am guessing that integers are transposed somehow before they are added to simd integers, and I tried a few different things but could not get it to align with the sequential salsa function. Any help on this would be appreciated, code is in the cpp file:
https://github.com/Steppenwolfe65/CEX/blob/master/CEX/SCRYPT.cpp

Now for the question.. I am using the parameters of 16384, 8, 1 as defaults, but I would like to add an option whereby the user can set the parameters via an enumeration (ex. low/med/high security), any advice on what those parameters should be given the state of today's computing power (low-medium), vs the expectation of computing power in 10 years (high setting).

Thanks,
John

Colin Percival · Answer 1 · Thu Mar 30 2017 09:04:43 GMT+0800 (China Standard Time)

Yes, my SSE code handles the 4x4 array in a permuted order -- if you look at crypto_scrypt_smix_sse2 you'll see the permutation happening in steps 1 and 10 as data enters and exits the function. This is a trick from djb which makes it easier to use SSE instructions.

Given that the strength of the derived key will depend on not just the work done by scrypt but also the strength of the input passphrase, it's hard to offer precise guidelines. Better to just set parameters based on the resources you have available. This is why e.g., the standard for passwords used for interactive logins is ~100 ms of password derivation -- if you're logging in to a system, 100 ms is on the same timescale as your brain + fingers + keyboard + network latency.

John G. Underhill · Answer 2 · Fri Mar 31 2017 01:00:23 GMT+0800 (China Standard Time)

I saw the for loops, and I figured that's what was going on with the sse salsa, I'll adjust it on the weekend. As for param sets, I think I'll just double and quadruple the base cpu cost of 16384.
Because this can be multi-threaded, should I set the parallelization factor to the number of threads, or keep it at one, and just tune cpu cost?

Colin Percival · Answer 3 · Fri Mar 31 2017 02:01:57 GMT+0800 (China Standard Time)

Assuming you have enough memory and you want to make the key derivation as strong as possible within a wall-clock time bound: Yes, set p to your number of CPUs.

John G. Underhill · Answer 4 · Fri Mar 31 2017 02:27:16 GMT+0800 (China Standard Time)

This is what I thought, will do..
One more for you.. one of the reasons I would prefer presets is to force conformance to minimum security bounds, to this end, I am setting the passphrase minimum size to 8, and the salt to 4 bytes. What do you feel the minimum lower bound for cpu cost should be, and under what circumstances would you recommend changing the memory cost from the default of 8?

Colin Percival · Answer 5 · Fri Mar 31 2017 06:07:32 GMT+0800 (China Standard Time)

I certainly wouldn't set N to any less than 16384. Don't change the value of r at all -- it's not really a memory cost, rather it was a safety valve in case memory latency/bandwidth performance changed significantly (which hasn't happened).

John G. Underhill · Answer 6 · Fri Mar 31 2017 22:17:27 GMT+0800 (China Standard Time)

Hi Colin,
I was considering making r a constant, so now I will. I'll clean it up on the weekend and drop one more message when it's done.. thanks for all your help.

John G. Underhill · Answer 7 · Sat Apr 01 2017 05:24:37 GMT+0800 (China Standard Time)

Hi,
So I had some time today to complete this, and here's the result:
SSE version of salsa is working now, and engaged automatically if SSE is detected. Passes kats in parallel or sequential mode, so all that is working well.
The r value was made a constant, so now user has just three parameters in constructor; digest name or instance, cpu cost, and parallel factor. If factor is set to 0, it automatically uses the number of system cores, otherwise if greater than core count, stays the same but uses all available cores (done only to pass the one kat p=16), and if parallel is less than available cores, sets the parallel max threads used by the parallel loop to that count.
I settled on a minimum passphrase size of 6, and salt size of 4, (I would rather min. 16 for salt, but that would break all the kats), less than these will throw.
No specific passphrase size is explicitly recommended (should be at least 8 bytes), however, I recommend that the salt size be digest block size - (passphrase size + 4 bytes of pbkdf counter + digests finalizer code). This ensures that one full block is processed without invoking the compression function during HMAC initialization. The recommended salt size (in the LegalKeySizes accessor function) is the digests output size to approximate that without knowing the passphrase size in advance.
I would set the minimum cpu cost to throw at less than 8192, but then I lose another kat vector. If there are any other official test vectors other than the ones in the paper, please let me know and I'll change that.
Anyways, let me know if this all sounds OK, and thanks again for all your help..

Colin Percival · Answer 8 · Sat Apr 01 2017 06:40:39 GMT+0800 (China Standard Time)

That all sounds reasonable.