liyishuai / coq-http2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fixpoint and Parsing

olekgierczak opened this issue · comments

I'm stuck implementing the integer decoding in HPACK using a monadic parser. The pseudocode, decoding an integer I with an N bit prefix, is as follows:

decode I from the next N bits
if I < 2^N - 1, return I
else
    M = 0
    repeat
        B = next octet
        I = I + (B & 127) * 2^M
        M = M + 7
    while B & 128 == 128
    return I

The important thing to note is that the function essentially recurses until a byte with the first bit set to 0 occurs. How can I reason that either this will happen, or the parser will run out of bytes to read?

Currently, I'd like to write the following:

Fixpoint decode_integer_h {m:Tycon} `{Monad m} `{MError HPACKError m}
           `{MParser byte m} (M:N) : m N :=
  a <- get_byte ;;
  let B := (N_of_ascii a) in
  let I := (BinNatDef.N.land B 127) * 2^M in
  if N.land B 128 =? 128
  then e <- decode_integer_h (M + 7) ;;
           ret (I + e)
  else ret I.

Definition decode_integer {m:Tycon} `{Monad m} `{MError HPACKError m}
           `{MParser byte m} (prefix:N) (n:N) : m N :=
  if prefix <? 2^n - 1 then ret prefix
  else a <- decode_integer_h 0 ;;
          ret (prefix + a).

But coq can't guess the decreasing argument for decode_integer_h.

Sorry I thought I had fixed it, but was incorrect.

A temporary workaround is to axiomatize a fixpoint combinator:

Class MonadFix (m : Tycon) := {
  mfix : forall t u, ((t -> m u) -> (t -> m u)) -> t -> m u;
}.

This can actually be instantiated with a parser monad carrying some fuel, giving up when it runs out.

Another more satisfactory variant can be found in this paper: agdarsec: total parser combinators, essentially requiring that the fixpoint body always consumes some input or throws an error, but it will probably require a rewrite of the Parser module.

Is there any other way to bound the length of the message here by exploiting some other metadata, or is there really an infinite input that can make the parser loop indefinitely in theory?

Thanks! I think the fixpoint combinator is exactly what we need.

The relevant section on encoding/decoding integers in HPACK is here.

A highlight is: "This integer representation allows for values of indefinite size... Integer encodings that exceed implementation limits -- in value or octet length -- MUST be treated as decoding errors. Different limits can be set for each of the different uses of integers, based on implementation constraints." I'm not sure exactly how to interpret this, specifically the "indefinite size," but it seems we could chose a fuel value.

That said, the way decoding seems to work is header block fragments are collected to a single header block (list of encoded header field representations), and then the entire header block is decoded sequentially, one header field representation at a time. This would imply that this parser couldn't ever be run (at least with respect to HTTP2) on an infinite input. The relevant HTTP2 spec is here.

So in short if my interpretation of the spec for HPACK is right we can chose a fuel value that we like and use MonadFix, or even just hardcode fuel int he implementation of decode integer. Or if my interpretation of the way HTTP2 header blocks work is right, we can maybe pass in a completely accurate fuel bound (calculated when done filling the header block) that would be sufficient in all cases.

Section 7.4 on implementation limits seems to agree with your interpretation, and similarly we can set limits on the header block size. Indeed, fuel will work fine for this.

Most uses of integers in HPACK are as indices into a table, which naturally bounds their size, but I don't see limits on header field names or values.

Will you also run into the same issue (but solved the same way) when parsing the whole header block, since it is not explicit how many fields there are?

I think this will be an issue, assuming that a header block fragment can contain more than one header field representation. Header block fragments are explained here without any real specifics, but it leads me to believe that header block fragments can be arbitrarily drawn at any octets in the header block, and therefore a header block fragment can contain anywhere from 0 to all of the headers.

Also, I attempted implementing the decode_integer_h function with fuel, but I'm getting an obligation I just can't prove (specifically obligation 2). I have a proof that works but it doesn't check, and I'm not really able to parse the error message. I'm pushing the failed attempt now, does anybody have a chance to take a look at it?

I'm around (also on slack)