phylum-dev / purl

Package URL implementation for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Percent signs are not percent encoded

matt-phylum opened this issue · comments

Overview

pkg:brew/openssl%25401.1@1.1.1w parses correctly but then incorrectly serializes as pkg:brew/openssl%401.1@1.1.1w (openssl%401.1 turns into openssl@1.1).

Expected Behavior

Percent signs in input should be percent encoded such that the serializing the PURL and parsing it again produces the same result.

Additional Context

althonos/packageurl.rs has the same problem, probably because both implementations use the same crate for encoding. It's surprising that the percent_encoding crate does not automatically percent encode percent signs since encoding a string containing percent signs and not encoding the percent signs can change the meaning of the string. servo/rust-url#822 says this is the expected behavior, but then how it's supposed to work is unclear.

The URL spec says

Of the possible values for the percentEncodeSet argument only two end up encoding U+0025 (%) and thus give “roundtripable data”: component percent-encode set and application/x-www-form-urlencoded percent-encode set. The other values for the percentEncodeSet argument — which happen to be used by the URL parser — leave U+0025 (%) untouched and as such it needs to be percent-encoded first in order to be properly represented.

Does that mean when serializing you're expected to percent encode everything twice? Once just for percent signs and then once for everything else? What is the difference between that and having percent in the percent encode set to begin with? Is it because the URL spec works with partially decoded strings so those encodings happen at different times?