Percent signs are not percent encoded
matt-phylum opened this issue · comments
Overview
pkg:brew/openssl%25401.1@1.1.1w
parses correctly but then incorrectly serializes as pkg:brew/openssl%401.1@1.1.1w
(openssl%401.1
turns into openssl@1.1
).
Expected Behavior
Percent signs in input should be percent encoded such that the serializing the PURL and parsing it again produces the same result.
Additional Context
althonos/packageurl.rs has the same problem, probably because both implementations use the same crate for encoding. It's surprising that the percent_encoding
crate does not automatically percent encode percent signs since encoding a string containing percent signs and not encoding the percent signs can change the meaning of the string. servo/rust-url#822 says this is the expected behavior, but then how it's supposed to work is unclear.
The URL spec says
Of the possible values for the percentEncodeSet argument only two end up encoding U+0025 (%) and thus give “roundtripable data”: component percent-encode set and application/x-www-form-urlencoded percent-encode set. The other values for the percentEncodeSet argument — which happen to be used by the URL parser — leave U+0025 (%) untouched and as such it needs to be percent-encoded first in order to be properly represented.
Does that mean when serializing you're expected to percent encode everything twice? Once just for percent signs and then once for everything else? What is the difference between that and having percent in the percent encode set to begin with? Is it because the URL spec works with partially decoded strings so those encodings happen at different times?