Cannot parse non-alphanumeric characters when not encoded
exiaohao opened this issue · comments
envoy (which use http-parser as parser) cannot parse unurlencoded non-alphanumeric characters (e.g. Chinese characters) correctly when process HTTP requests with it.
I know http standard does not allow double-byte characters in requests, it must be urlencoded. But we cannot control clients' behaviors and mandatory require them follow this standard. Even some unreasonable says that nginx have strong compatibility why envoy and http-parser can't?
Is there any way to compatible it?
related issue in envoy: envoyproxy/envoy#4854
Thank you for reporting this!
Unfortunately, I'm afraid that http-parser does the right thing by rejecting such invalid requests.
If you really wanted to do this, you could float a patch on top of http-parser that extends the tokens
array in http_parser.c to accept bytes in the range 128-255. But like Fedor said, the current behavior is not a bug.
@bnoordhuis Thanks for your help, it's really not a bug, I'll try it