rize / UriTemplate

PHP URI Template (RFC 6570) supports both URI expansion & extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UnNamed ignores the dot (.) when there is a file extension in the path

corleonis opened this issue · comments

I have problem parsing URI path with file extension at the end. So the my URLs are as follows.

I have endpoint: https://site.com/v1/company/us/123456.json and that mapps to following Swagger spcification: /v1/company/{country}/{companyId}.{format} where format can be either XML or JSON.

So my problem is that the code sees the path as like there is no file extension and the companyId=123456.json because the . is deemed as a valid character and not a separator as is the case with /

So my question is there any way around that problem?

Hi @corleonis, The template spec supports dot-prefix e.g. {.format} (https://tools.ietf.org/html/rfc6570#section-3.2.5). So, maybe you need to rewrite your template to this /v1/company/{country}/{companyId}{.format}.

OK I wrote a test for that didn't want to open new pull request but here it is:

use Rize\UriTemplate;

class ExtractorTest extends PHPUnit_Framework_TestCase
{
    public function testParseTemplateWithLiteral()
    {
        // will pass
        $uri = new UriTemplate('http://www.example.com/v1/company/', []);
        $params = $uri->extract('/{countryCode}/{registrationNumber}/test{.format}', '/gb/0123456/test.json');
        static::assertEquals(['countryCode' => 'gb', 'registrationNumber' => '0123456', 'format' => 'json'], $params);
    }

    /**
     * @depends testParseTemplateWithLiteral
     */
    public function testParseTemplateWithTwoVariablesAndDotBetween()
    {
        // will fail
        $uri = new UriTemplate('http://www.example.com/v1/company/', []);
        $params = $uri->extract('/{countryCode}/{registrationNumber}{.format}', '/gb/0123456.json');
        static::assertEquals(['countryCode' => 'gb', 'registrationNumber' => '0123456', 'format' => 'json'], $params);
    }
}

First one will pass and that is something that I have observed in my code I have some endpoints that have similar structure but once you have {var1}{.var2} template everything explodes, the var1 is of type UnNamed and maps the whole string 0123456.json as the . is a valid path character according the RegEx in Abstraction::$pathRegex

Hey @rezigned have you had a chance to check the tests it might be misconfiguration but I can't see if I've missed anything specific based on the docs. Any help would be much appreciated.

Hi @corleonis, I re-read the specs again and found that it is actually a correct behavior.
. is part of unreserved set (i.e. it will be passed without percent-encode when expanded)
For a simple var, e.g.

{var}{.ext}` and `$var = "test.test"; $ext = "json"`

the output will be test.test.json.

So, when we parse (extract) it back, there's no way to differentiate between simple var vs dot label expansion (since simple var also accepts . char).
Basically the specs wasn't created with extraction in mind.

Solution, I can think of the work around for this. For the extraction, I could add 1 parameter to make simple var ignores unreserved set or specific chars. So, {var} now will only accept anything but .. etc.

$uri->extract('/{countryCode}/{registrationNumber}{.format}', '/gb/0123456.json', false, ".");
>> 

If that sounds good, I'll implement it (I have been very busy lately, I'll try to finish it within this week or if you can help that would be good)

Hey, I saw the RFC 3986 spec and was wondering if there is workaround and your proposed solution sounds good to me. Only thing that might be useful in the generic case is this new parameter could be an array so people could ignore one or many chars "." -> [".", ","]. It's up to you see what fits in the current model.

This is some interesting discussion here and I'd like to chime in and share my view 👍

@rezigned While I agree that the additional parameter may sound like a good location solution, I'm not sure if this is going to help in the long run. @corleonis already raises a valid point of how to specify multiple characters and/or limiting this to certain characters only.

Also, what if the consumer wants to exclude certain characters only for certain placeholders, like only the "registrationNUmber" in the above example, but permit them in the "countryCode"?

Is much gained by adding this parameter?

@corleonis Have you verified your second test case above works with strict mode? Arguably, both test cases seem to work as documented (which may or may not be expected). However, perhaps this may need some special care for strict mode where one may assume the second example should indeed match the extension?

Just my two cents :-)

@corleonis Yes, I've planned to support multi-chars (maybe I not make it clears enough) based on trim function that you can pass multiple chars to 2nd arguments in the form of "\n\t\0" etc. I could make it support array params like your suggestion too :)

@clue Thank you for your feedback. I've been thinking about about pros and cons of adding this solution. Your point is correct, this solution will either be enabled for all vars / or not at all (cannot opt-in).

I also have a new solution that i'd like to discuss here and listen for everyone's comments.

Solution 2. Using lookahead to determine the extraction.
For a template {var}{.ext}.
The extraction for {var} will take a look at the next expression's operator (in this case it's .) and exclude it from extraction process of itself (e.g. {var} now will ignore .)

Cons: The process is hidden (implicit) and might cause confusion to user (not sure if adding document will help this or not)

Solution 2. Using lookahead to determine the extraction.

I'll leave implementation details to you, so all I can say is this sounds sane from a consumer perspective for this particular case 👍

We should probably work out in which cases we apply this implicit extraction, i.e. only {var} (not followed by {.ext}) will still match test.json. Does a similar parsing rule apply to other characters as well?

only {var} (not followed by {.ext}) will still match test.json

Yes, no implicit lookup happen here. Basically it will only apply when simple var {var} is followed by other operators e.g. {.var}

Does a similar parsing rule apply to other characters as well?

Other chars (operators) e.g. + # / ; ? are not part of the unreserved chars so they'll get pct-encoded and doesn't have the problem like . (raw value) e.g.

$var = '+ / .'; and {var} will get expanded to "%2B %2F ." So, there's no problem when we parse it back (because it gets encoded unlike ".")

The process is hidden (implicit) and might cause confusion to user (not sure if adding document will help this or not)
[…]
Basically it will only apply when simple var {var} is followed by other operators e.g. {.var}

This approach and its suggested documentation like above make perfect sense to me 👍

@rezigned hey there I just wanted to check if you had a chance to look at a solution for the forward lookup. I appreciate you might be busy with other things and if you can provide some tips on how you see th is implemented roughly I can try to do a quick patch.

@rezigned @clue I've opened a pull request if you'd like to have a quick look #9

This is just to move thing along, it's tested and works with multiple variables separated by dots. I am open for any feedback and/or changes you may have to propose.

@corleonis Sorry for the late reply, I'm reviewing it.

Add 0.3.1 tag. Thank you!

Yay! Yeah it's been pleasure :)