troch / path-parser

A small utility to parse paths.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Percent encoding support for URL path

alextes opened this issue Β· comments

Seems there is no support for percent encoding either unencoded or encoded e.g.

const Path = require('path-parser');
new Path('πŸ¦„');
Error: Could not parse path '%F0%9F%A6%84'
    at tokenise (/Users/alextes/code/notepad/js/node_modules/path-parser/dist/cjs/path-parser.js:95:15)
    at new Path (/Users/alextes/code/notepad/js/node_modules/path-parser/dist/cjs/path-parser.js:166:23)

Since percent encoding is allowed in URLs and common in translated routes would you appreciate a PR that supports recognizing % as part of a fragment?

Hmm πŸ€” , just noticed the passing test for exactly this feature. Let's see if that's a recent change πŸ˜„ .

So with router5/router5#63 this got implemented for parameters. This would be about supporting it in the fragment.

Maybe good to share that you call fragment is what the RFC calls path. A little confusing as there's also URL fragments. From the RFC:

   The following are two example URIs and their component parts:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

I'd suggest updating that to match the RFC which talks about components 😁 .

Hi @alextes, do you mean % in the the path you define? i.e. /my%path/:id

@troch my example broke for some reason πŸ€” . Anyway, I put the unicorn emoji back.

To put it into words: if a route contains Cyrillic or Greek characters, this unicode character is neither encoded when a route is created nor is it possible to encode it for yourself. The '%' character does not seem to be a valid character for a 'fragment' although it should be accepted in URL paths, which you call fragments.

Ideally, router5 would do the encoding so the dev does not have to worry about it 😁 .

I don't think unicode characters should be encoded, but just left as is? (given the browsers your application targets have support for them). Here the fix is to allow their use both in paths and URL params so they can be respectively parsed and matched.

I don't think browsers do. I could look up the relevant section in the spec but maybe the following is enough. Although browsers display unicode in the address bar, if you for example would add 'πŸ¦„' as a query param, window.location.href would show it percent encoded. So if I'm understanding the workings of the router correctly it has to be matching the registered unicode routes by their percent encoding. Either by encoding unicode or by refusing unicode and having the dev encode when registering routes. In other words, I don't think unicode URLs are okay according to the spec.

Reading your response again, I think we might already be in agreement for step one.
We add % to the fragment and url-parameter rule and we at least cover the case where the browser encodes, it just means the dev has to encode when registering routes. I'm of course happy to put in a PR with that change πŸ˜„ .

I'm of course happy to put in a PR with that change πŸ˜„ .

If you have time, definitely πŸ‘