Regarding first-order leakage in the database

Question

Regarding first-order leakage in the database

skotskt opened this issue 4 years ago · comments

Hi,

We found that the variable-key database would have a first-order leakage and was broken by a first-order CPA though it is mentioned that the traces in the database should have no first-order leakage.
For the fixed-key database, such first-order leakage was not clearly confirmed via our experiment, but the CPA result suggests that we would find such a leakage and retrieve the secret key by the first-order CPA if we would have a sufficient number of traces.
Please find the attach the PDF file for our experimental results.
Could you please confirm and comment on the attached results?
ASCAD_issue_slide.pdf
If necessary, please also refer to the following link to check the source code of the experiment.
https://github.com/skotskt/ASCAD_First_Order_CPA

Best regards,

Prouff Emmanuel · Answer 1 · Thu Apr 30 2020 21:05:55 GMT+0800 (China Standard Time)

Hi
Thank you for sharing this with us. We will discuss your results among us and will come back to you asap.
Regards
Emmanuel

Prouff Emmanuel · Answer 2 · Mon May 11 2020 23:04:41 GMT+0800 (China Standard Time)

Hi,
We tried to reproduce your results with two different scripts and we didn't got any successful CPA attacks on our ASCAD traces. Please find a notebook script and the associated python file in attachment. You can save them in the ASCAD repository and directly launch the notebook (you juste need to change the direct datapath at the beginning of the notebook file). Please could you confirm that you can reproduce our tests and that you agree with us on them?
A possibility to explain your results (CPA success) might be that your generation of the npy files is erroneous and (for some reason) induces a bias. As we didn't have them, we could not investigate more on that aspect. Could you please adapt your scripts to directly work on the given h5 files (or check your generation scripts to validate that nothing went wrong)?
Regards,
ASCAD TEAM

ASCAD_CPA.zip

skot · Answer 3 · Thu May 14 2020 14:23:55 GMT+0800 (China Standard Time)

Hi,
Thank you for sharing data.
As you mentioned, we could reproduce your results that CPA did not succeed with your script.
On the other hand, we found two differences between your method and ours that may affect the result of CPAs:

You used "sklearn.preprocessing.normalize" method, which performs L2 normalization.
The time at which you calculated the correlation coefficient was not fixed, i.e. different timings were used for each key candidate.

According to the differences, we changed your script and could reproduce our previously submitted result.
In addition, to eliminate the effect of normalization method which is not usually included in its procedure, we performed another method without normalization.
Consequently, we found that a common first-order CPA without any normalization successfully recovers the secret key in frequency domain.
This result also shows that there is a possibility of first-order leakage in the traces.
For more details, please find the attached notebook file.
ASCAD_CPA_rep.zip

Best regards,

Prouff Emmanuel · Answer 4 · Thu May 14 2020 21:51:09 GMT+0800 (China Standard Time)

Hi,
Thank you for the updated script. We confirm your results. It seems that your normalization step combine the information of several leakage points in way that reveal information (without your normalization step, the attack fails even when considering a single time sample - index 188 - during the attack). For the moment, we didn't succeed in interpreting this but we plan to investigate (if you have any idea, it is welcome :-) ).
About the attack in frequency domain, the results are (slightly) less surprising since the FFT combines several time samples during the transformation (and thus combines samples related to the masks and to the masked data). Since the combination is essentially linear, it stays however unclear why this reveals a leakage. This is also something that should be investigated.
Best regards,

skot · Answer 5 · Wed May 27 2020 16:35:33 GMT+0800 (China Standard Time)

Hi,

Thank you for your confirmation and time.
We agree that normalization and FFT would cause to employ several points as you pointed out, but it seems multivariate (1st-order) analysis, not high-order analysis
Since normalization might be puzzled to interpret the result as you mentioned, we did additional investigation with common pre-processing for 1st-order SCAs. The CPAs were also successful with the removal of DC component or the high-pass filtering.
Could you please see and confirm the attached file?
ASCAD_CPA_DC_HPF.zip

Best regards,
Kotaro SAITO, Tohoku University

Prouff Emmanuel · Answer 6 · Wed May 27 2020 17:16:56 GMT+0800 (China Standard Time)

Hi,
Thank you again for sharing your studies with us. We also did some investigations to understand your previous results with the max/min normalization. You can find here-attached a notebook summing-up our tests. Essentially, it seems that the leakage point at index 188 contains information on both the masked sbox output and the mask. This information is not detected by first-order analysis (ie analysis exploiting only the leakage mean) but is revealed when the second-order moment appears in the statistical test. This is exactly what happens when you divide the leakage at point 188 by the max of the trace. Indeed, this max is itself frequently achieved at point 188. Hence, when some threshold is passed by the value at point 188 your normalization returns 1, otherwise it returns the leakage at point 188 divided by some other value (which can be considered to be a random one for the explanation). When developed, this kind of prediction function (1 if trace[188]>some threshold, random otherwise) contains the second-order statistical moment. To validate this hypothesis, we tested a CPA which simply squares the leakage at point 188 (after centering): it leads to an efficient attack (see box 24 in our notebook). We also validated that the "argmax only" already contains information on the unmasked sbox output (which is in-line with our "threshold" interpretation). Eventually, we also performed a classical multivariate second-order CPA attack to see how it compares to the univariate second-order CPA at point 188: our results show that the first one is significantly more efficient.

Regards,

Emmanuel
qtest_ascad_clean.zip

Prouff Emmanuel · Answer 7 · Wed May 27 2020 17:17:58 GMT+0800 (China Standard Time)

Hi again,
Thank you for the new investigations. We are going to have a look on them in the coming days and will come back to you asap.
Regards,
Emmanuel

skot · Answer 8 · Fri May 29 2020 12:40:15 GMT+0800 (China Standard Time)

Hi,
Thank you for doing your study on min-max normalization.
I think your explanation makes sense, and I understand the point at index 188 would include second-order moment though it is a bit surprised that a single point contains information on both the masked sbox output and the mask.
I really appreciate if you could also confirm and comment on ｍy last results.
Best regards,
Kotaro

Prouff Emmanuel · Answer 9 · Thu Jun 11 2020 00:37:59 GMT+0800 (China Standard Time)

Hi

We successfully reproduced your last results. Our feeling about them is that the observed success with 100000 traces is not very convincing. Indeed, without any knowledge about the correct key hypothesis, it is not clear whether one would decide that one key candidate shows significant high score. Actually, assuming that the score follows a Gaussian distribution, it seems that the score of the correct key stays at a distance of the mean score which is close to the standard deviation.

Also, we already know that classical second-order attacks succeed in less than one thousand traces. Hence, having an unclear success (with same assumptions on the adversary capability) after 100000 traces does not seem to be very interesting.

The fft pre-processing (already tested in other papers) is maybe of interest, in particular when the traces are desynchronized. But, up to us, it still has to be argued in the ASCAD context.

Best regards

skot · Answer 10 · Fri Jul 03 2020 17:46:23 GMT+0800 (China Standard Time)

Hi,
Thank you for confirming our result.
Unfortunately, we cannot investigate what happens when more than 100,000 waveforms are used for the 1st-order attack, and do not know whether it would become convincing or not when increasing the waveforms.
But, through this discussion, I was able to deepen my understanding of the characteristics of ASCAD database.
Thank you again for your time and consideration.
Best regards,
Kotaro

Ngoc-Tuan Do · Answer 11 · Thu Jun 10 2021 23:18:07 GMT+0800 (China Standard Time)

Hi,
I am trying the second-order CPA on ASCAD database. However, all experiments are not successful. I have combined the leakage of mask and leakage of Sbox masked output. I have also done the pre-processing by using the absolute differences of all pairs on a power trace. I don't know why. If you don't mind, could you please give me an example of a second-order CPA on ASCAD? Thank you very much.