risc0 / risc0

RISC Zero is a zero-knowledge verifiable general computing platform based on zk-STARKs and the RISC-V microarchitecture.

Home Page:https://risczero.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Polluting" the trace

pgrinaway opened this issue · comments

For various reasons, we are interested in knowing how many (perhaps a lower bound) columns would be "polluted" (that is, contain) a value that was used in the VM. Clearly, the compiler will do various optimizations that make it difficult to tell where the value will appear, but for instance, if I read in a private input, then commit it, it surely appears in at least 1 column, right? Is there any way for me to dig a little deeper into this and examine the trace?

What do you mean by "if I read in a private input, then commit it"? For me, this reads like "I read in a private input and commit it by writing the private input to the public journal" but I don't think this is what you mean @pgrinaway. Could you clarify?

Sorry, I think I wrote that a little too fast. Essentially I did mean what you said--I am just trying to guarantee that the value appears in the trace somewhere, and that the verifier can check that it appears.

yes, but it will appear in the trace even if you don't write it to the journal. the trace keeps track of the entire state of the risc-v emulator and the program inputs are accessed through system calls (using the ecall instruction)

To be clear, the verifier won't see the execution trace. If a verifier wants to check on some value, then the guest needs to write this value to the journal using env::commit. This makes the value public.

The remaining question for me is if it is possible to easily know how many columns of the trace were 'touched' by this value. I know this is affected by many things including the compiler used to generate the ELF binary, but could one establish a lower bound higher than 1 column?

The remaining question for me is if it is possible to easily know how many columns of the trace were 'touched' by this value. I know this is affected by many things including the compiler used to generate the ELF binary, but could one establish a lower bound higher than 1 column?

I don't think it's easy to know how many columns were touched by this value as it's dependent on the circuit. I think this will become clear when we publish our circuit code in the future.

Also, I'm curious about why are you asking this... Is there something security-related that you're wondering about?

@pdg744 @jbruestle is this something you can respond to?

I guess I don't understand the purpose behind the question. In general, the trace is only ever held by the prover, and the resulting proof is zero knowledge with respect to the trace (i.e. the proof doesn't reveal anything in the trace), so I'm not sure why it matters how many times a value appears in a trace or even if it appears at all (since clearly optimizations may modify the actual data representation). I suppose if one really wanted to know how a given input influences a trace, you could prove multiple times with different values for the input and do a diff of the traces, but I'm still unsure why this would be of any use.

Sorry for the delayed response, and thanks for all the answers! I will write up a little more details as to the actual goal here--I don't think I have any direct security concerns about the proof itself. But I do have one more code question, if you don't mind: is there any way to get Receipt inside the guest? That is, I'd like to parse it inside the VM to prove things about it (besides its verification and input/output).

Thanks again!

@pgrinaway here's a great starting place: https://www.risczero.com/blog/proof-composition seeing that this is a non-issue, i'm going to close this. We also welcome questions on our discord https://discord.com/invite/risczero so feel free to ask questions there as well. Closing