facebook / winterfell

A STARK prover and verifier for arbitrary computations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simplifying constraint composition polynomial

Al-Kindi-0 opened this issue · comments

In Winterfell, the constraint composition polynomial is constructed using the following formula

$$ \sum_{i=0}^{k-1} \left(\alpha_i + \beta_i \cdot X^{d_i}\right) \cdot \frac{C_i(X)}{Z_{H_i}(X)} $$

where:

  1. $(\alpha_i, \beta_i)$ is a per-constraint random tuple.
  2. $d_i$ is what is called the adjustment factor. This is needed in order to make each of the terms in the above expression have the same degree.
  3. $C_i$ is the i-th constraint.
  4. $Z_{H_i}$ is the vanishing polynomial on the sub-group $H_i$ where the constraint $C_i$ should hold.

In fact, in Winterfell the constraints are grouped into groups that share the same $d_i$ in order to optimize things.
The use of degree adjustments made sense in earlier versions of the protocol i.e. the original ALI protocol. Moreover, and as noticed here in page 16, we can compute the constraint composition polynomial as

$$ \mathbb{C}(X) := \sum_{i=0}^{k-1} \delta_i \cdot \frac{C_i(X)}{Z_{H_i}(X)} $$

For all sensible choices of security parameters, choosing $\delta_i := \delta^i$ does not degrade soundness as this is usually not the most dominiant part.

Related to this is how $\mathbb{C}(X)$ is divided into d:= max_degree - 1 many polynomials. In page 15, the following decomposition is proposed

$$ \mathbb{C}(X) = h_0(X) + X^{|H|}\cdot h_1(X) + \cdots + X^{(d-1)\cdot|H|}\cdot h_{d-1}(X) $$

where $|H|$ is the length of the trace.
This decomposition has the advantage of giving a slightly better soundness error as well as avoiding having quotients with denominator $(X - z^d)$.
The following issue aims at discussing the merits of introducing such changes in the Winterfell setting as well as the best approach to implement them if we decide to go this route.

This is very interesting! If we implement this, this would simplify the code base quite a bit.

I do have a question though: let's say we have a set of constraints where some constraints are of degree 2 and others are of degree 8. Currently, when building $\mathbb{C}(X)$, we normalize all degrees and then prove that $\mathbb{C}(X)$ has degree less than or equal to $8 \cdot (n - 1)$ where $n$ is the trace length.

Under the proposed approach, there would be no normalization and thus, $\mathbb{C}(X)$ will be a combination of various constraints - some of which have degree 2 and others have degree 8. But in the end, we'd still be proving that $\mathbb{C}(X)$ has degree less than or equal to $8 \cdot (n - 1)$.

Now, let's say we take the above situation, and remove all constraints of degree 8. So, we have $\mathbb{C}(X)$ which combines constraints only of degree 2, but we still prove only that it has degree less than or equal to $8 \cdot (n - 1)$. Does this still work? Seems like it should, but if it does - could we prove that $\mathbb{C}(X)$ has degree smaller than or equal to $16 \cdot (n - 1)$, $32 \cdot (n - 1)$ etc. and be satisfied? And if we can go to 16, 32 etc., at which point would we start running into soundness issues?

In general, the protocol will show that $\mathbb{C}(X)$ is of degree $\leq d\cdot (k-1)$ where $d$ is the maximum degree over all constraints and $k$ is the trace length. So the main question is how does raising $d$ affect the soundness of the protocol.
The answer is the effect is going to be at the IOP protocol in a manner similar to how the degree of a polynomial affects the soundness of the Schwartz-Zippel lemma. More precisely, the soundness error of the protocol is less than

$$ L^{+}\cdot\left(\frac{C}{|\mathbb{F}|} + \frac{d\cdot(k^+ - 1) + (k-1)}{|\mathbb{F}| - |D\cup H|}\right) + \epsilon_{FRI} $$

where:

  1. $H$ is the trace domain of size $k$
  2. $D$ is the LDE domain of size $n$
  3. $k^+ := k + 2$
  4. $L^+ := \frac{m+0.5}{\sqrt{\rho^+}}$ where $m\geq 3$ is the Johnson proximity parameter which for simiplicity can be taken to be equal to $3$ and $\rho^+ := \frac{k^+}{n}$
  5. $\mathbb{F}$ is the extension field.
  6. $\epsilon_{FRI}$ is the soundness error bound for FRI run with proximity parameter $\theta^+ := 1 - \alpha^+ := 1 - (1 + \frac{1}{2m})\cdot \sqrt{\rho^+}$
  7. $C$ is set to $1$ in the case one uses different randomness for batching the different constraints else it is set to the number of constraints.

So as can be seen, the soundness error bound is a quotient of the maximal degree (i.e. complexity) of the constraints and the size of the field with the adjustment through $L^+$ required for being in the setting of list decoding. In fact, in the unique decoding setting the protocol can be argued for using the Schwartz-Zippel lemma.

I am wondering about the further costs, if any, if we avoid dividing by the $Z_{H_i}$ and instead divide once by $Z_H$. This means that we will instead be computing

$$ \mathbb{C}(X) := \frac{ \Sigma_{i=0}^{k-1} \delta_i \cdot C_i(X) }{Z_{H}(X)} $$

where $C_i$ now include selector polynomials in addition to the actual constraints.
I think this change simplifies things and makes potential optimizations (e.g. common sub-expression elimination) easier.

Actually, this would be more costly as one would need to compute Lagrange selector polynomials with quiet large powers. I wonder, however, if we could precompute such quantities once and for all.