Relationship data is parsed even if parser is not parsing the data

Question

Relationship data is parsed even if parser is not parsing the data

lindyhopchris opened this issue 5 years ago · comments

Christopher Gammie commented 5 years ago

So I've identified the root cause of #236. I'm opening this as a new issue to wipe the slate clean and document what the problem is. I'm trying to be as constructive as possible here by proposing a solution to this root cause.

One approach to JSON API that server-side libraries take is to only ever have the data member of a relationship in the encoded document if the related resource appears in the included member of the document. This is our use case that we cannot change, and it is a common implementation of the spec.

We implemented this in v1 via the schema's getRelationships() method, using a combination of a closure for the data and the $includeRelationships argument, i.e.:

    public function getRelationships($resource, $isPrimary, array $includeRelationships)
    {
        return [
            'author' => [
                self::SHOW_SELF => true,
                self::SHOW_RELATED => true,
                self::SHOW_DATA => isset($includeRelationships['author']),
                self::DATA => function () use ($resource) {
                    return $resource->getAuthor();
                },
            ],
        ];
    }

What we're doing here is detecting whether the parser will parse the data. And importantly, only incurring the cost of materialising the related resource if it is definitely going to be included.

In v3 we can no longer detect this from within the schema because the method is now only provided the resource. However it is not strictly needed to detect in the schema, because the parser already works out if it is going to parse the data.

What we need to do in v3 is return this from our schema:

    public function getRelationships($resource)
    {
        return [
            'author' => [
                self::RELATIONSHIP_LINK_SELF => true,
                self::RELATIONSHIP_LINK_RELATED => true,
                self::DATA => function () use ($resource) {
                    return $resource->getAuthor();
                },
            ],
        ];
    }

And then only invoke the callback if the parser decides it is going to parse the relationship data. I.e. if $isShouldParse is true on this line:
https://github.com/neomerx/json-api/blob/develop/src/Parser/Parser.php#L225

This is not currently possible because the relationship data is parsed before that line, i.e. at:
https://github.com/neomerx/json-api/blob/develop/src/Parser/Parser.php#L221

Because this line here invokes the callback immediately, even if the data will not be parsed:
https://github.com/neomerx/json-api/blob/develop/src/Parser/RelationshipData/ParseRelationshipDataTrait.php#L102

I think this can be fixed by some sort of extension that:

Allows the relationship data callback to be invoked later and parsed at that point.
Toggle hasData on the relationship to false if $isShouldParse is false.

(1) can be solved by adding a RelationshipDataIsCallable class to your package along these lines:

<?php

namespace Neomerx\JsonApi\Parser\RelationshipData;

use Neomerx\JsonApi\Contracts\Factories\FactoryInterface;
use Neomerx\JsonApi\Contracts\Parser\IdentifierInterface;
use Neomerx\JsonApi\Contracts\Parser\PositionInterface;
use Neomerx\JsonApi\Contracts\Parser\RelationshipDataInterface;
use Neomerx\JsonApi\Contracts\Parser\ResourceInterface;
use Neomerx\JsonApi\Contracts\Schema\SchemaContainerInterface;

class RelationshipDataIsCallable implements RelationshipDataInterface
{

    use ParseRelationshipDataTrait;

    /**
     * @var FactoryInterface
     */
    private $factory;

    /**
     * @var SchemaContainerInterface
     */
    private $container;

    /**
     * @var PositionInterface
     */
    private $position;

    /**
     * @var callable
     */
    private $callback;

    /**
     * @var RelationshipDataInterface|null
     */
    private $data;

    /**
     * RelationshipDataIsCallable constructor.
     *
     * @param FactoryInterface $factory
     * @param SchemaContainerInterface $container
     * @param PositionInterface $position
     * @param callable $callback
     */
    public function __construct(
        FactoryInterface $factory,
        SchemaContainerInterface $container,
        PositionInterface $position,
        callable $callback
    ) {
        $this->factory = $factory;
        $this->container = $container;
        $this->position = $position;
        $this->callback = $callback;
    }

    /**
     * @inheritDoc
     */
    public function isCollection(): bool
    {
        return $this->getData()->isCollection();
    }

    /**
     * @inheritDoc
     */
    public function isNull(): bool
    {
        return $this->getData()->isNull();
    }

    /**
     * @inheritDoc
     */
    public function isResource(): bool
    {
        return $this->getData()->isResource();
    }

    /**
     * @inheritDoc
     */
    public function isIdentifier(): bool
    {
        return $this->getData()->isIdentifier();
    }

    /**
     * @inheritDoc
     */
    public function getIdentifier(): IdentifierInterface
    {
        return $this->getData()->getIdentifier();
    }

    /**
     * @inheritDoc
     */
    public function getIdentifiers(): iterable
    {
        return $this->getData()->getIdentifiers();
    }

    /**
     * @inheritDoc
     */
    public function getResource(): ResourceInterface
    {
        return $this->getData()->getResource();
    }

    /**
     * @inheritDoc
     */
    public function getResources(): iterable
    {
        return $this->getData()->getResources();
    }

    /**
     * @return RelationshipDataInterface
     */
    private function getData(): RelationshipDataInterface
    {
        if ($this->data) {
            return $this->data;
        }

        return $this->data = $this->parseData(
            $this->factory,
            $this->container,
            $this->position,
            \call_user_func($this->callback)
        );
    }

}

Then change the line in ParseRelationshipDataTrait::parseData() to create that via the factory for a callable instead of immediately invoking the callback. I.e. it allows the invoking of the callback to be delayed. IMHO it makes sense to add this class to your package, because a callback is something you already support. Plus it means the class is created by the factory, which allows extensions to override the handling of a callback if they want to.

(2) is trickier because we need to hook into the parsing. In the parser we'd need to do something like this:

$isShouldParse = $this->isPathRequested($relationship->getPosition()->getPath());

if ($isShouldParse === true && $relationship->hasData() === true) {
  // ...parse as it does at the moment
} else if ($isShouldParse === false && $relationship instanceof LazyRelationshipInterface) {
  $relationship->willNotParse(); // or something else... point is we need to tell the relationship it will not be parsed, which allows it to clear its data and set `hasData` to `false`.
}

The problem for extending is that this bit of code is part of Parser::parseResource() which is private plus we only need to overload a bit of it, not the whole thing.

Overall I think this is a better solution than what we currently do in v1... it's just a matter of how we can hook into your package to implement it as an extension.

So my question is, how would you suggest we do an extension that allows us to incorporate the above?

neomerx · Answer 1 · Sat May 04 2019 17:38:41 GMT+0800 (China Standard Time)

I think I have to put your mind to some of the nuances of what is the data and how it could and should be processed. You need to understand them before pushing a 'solution'.

Suppose we want to encode a person and a few his/her comments. Let's consider a couple of possible JSON API document and answer what kind of data we need in relationships. Spoiler: it's more than either nothing or the full resource.

No Data in Relationship

{
    "data": {
        "type" : "people",
        "id"   : "9",
        "attributes" : {
            "first_name" : "Dan",
            "last_name"  : "Gebhardt"
        },
        "relationships" : {
            "comments" : {
                "links" : {
                    "self"    : "http://example.com/people/9/relationships/comments",
                    "related" : "http://example.com/people/9/comments"
                }
            }
        },
        "links" : {
            "self" : "http://example.com/people/9"
        }
    }
}

No data in relationship needed. Currently it corresponds to self::RELATIONSHIP_DATA omitted.

Some Data in non-included Relationship

{
    "data": {
        "type" : "people",
        "id"   : "9",
        "attributes" : {
            "first_name" : "Dan",
            "last_name"  : "Gebhardt"
        },
        "relationships" : {
            "comments" : {
                "data": [
                    { "type" : "comments", "id" : "1" },
                    { "type" : "comments", "id" : "2" }
                ],
                "links" : {
                    "self"    : "http://example.com/people/9/relationships/comments",
                    "related" : "http://example.com/people/9/comments"
                }
            }
        },
        "links" : {
            "self" : "http://example.com/people/9"
        }
    }
}

Identity data in the relationship are needed. Even if we do not include the relationship we do need some data from it. Please think about it again. Even if the relationship is not included we do need some data from it. Though it's only type and id.
Thus hiding all the data behind Closure will not work. The encoder will have to execute the closure to get that type and id even if doesn't need all the rest.

Currently, it corresponds to self::RELATIONSHIP_DATA filled with Neomerx\JsonApi\Contracts\Schema\IdentifierInterface objects or full resources.

All Data in included Relationship

Same as above plus included section with full data (type, id, attributes, relationships).

Full data in a relationship are needed.

Currently, it corresponds to self::RELATIONSHIP_DATA filled with full resources.

Christopher Gammie · Answer 2 · Sat May 04 2019 18:10:01 GMT+0800 (China Standard Time)

Yes I totally agree with those, but you're missing a scenario (which is what this issue is about): data only if related resource is included.

So the four scenarios are:

No data in relationship - schema does not return data.
Data in non-included relationship - schema returns the identifiers
Data in included relationship - schema returns the related resources.
Data only if relationship is included (i.e. the relationship will definitely be parsed by the parser) - returned resources need to be wrapped in a closure and only parsed if the parser is parsing the relationship.

(4) corresponded to SELF::SHOW_DATA and SELF::DATA in v1/v2, in v3 it corresponds to SELF::RELATIONSHIP_DATA plus the $isShouldParse.

I'm in total agreement with 1-3, but none of 1 to 3 match our use case of the encoder. So this issue is about supporting that missing scenario. It forms 100% of our use case and is supported by other server-side encoding libraries.

neomerx · Answer 3 · Sat May 04 2019 18:14:47 GMT+0800 (China Standard Time)

If you do not want to provide any identifiers in relationships but links only, you can add support for SHOW_DATA between these two lines. It might be something like

if (\array_key_exists(BaseCustomSchema::RELATIONSHIP_HAS_DATA, $description) === true &&
    $description[BaseCustomSchema::RELATIONSHIP_HAS_DATA] === false
) {
    unset($description[BaseCustomSchema::RELATIONSHIP_DATA]);
}

neomerx · Answer 4 · Sat May 04 2019 18:22:31 GMT+0800 (China Standard Time)

No data in relationship - schema does not return data.

Could be achieved either by omitting RELATIONSHIP_DATA or by RELATIONSHIP_HAS_DATA set to false as shown above.

Data in non-included relationship - schema returns the identifiers

Could be achieved with IdentifierInterface.

Data in included relationship - schema returns the related resources.

Typical usage. Just return full resource in RELATIONSHIP_DATA.

Data only if relationship is included (i.e. the relationship will definitely be parsed by the parser) - returned resources need to be wrapped in a closure and only parsed if the parser is parsing the relationship.

If you want identifiers then it's 2) or 3) though only identifiers will be used in relationships and no resources will be added to included section.

If you do not want identifiers and only links are OK then it could be achieved either by omitting RELATIONSHIP_DATA or by RELATIONSHIP_HAS_DATA set to false as shown above.

neomerx · Answer 5 · Sat May 04 2019 18:37:35 GMT+0800 (China Standard Time)

I've updated the sample. So you can try it yourself.

Christopher Gammie · Answer 6 · Sat May 04 2019 18:59:31 GMT+0800 (China Standard Time)

Ok great but why not add it to this package as this is a standard JSON API approach? That's the bit I don't understand... this isn't an extension, it's a standard use-case and a totally compliant interpretation of the spec. If the point of your encoder is to be JSON API compliant, then this missing scenario needs to be added in.

If you do not want to add it in, please can you explain how we extend to gain access to the $isShouldParse, as described at the start of this issue? The extension suggested by #236 is not the extension we need to implement. Rather than updating #236 extension, please can you show us how to do the extension described in this issue?

neomerx · Answer 7 · Mon May 06 2019 13:48:37 GMT+0800 (China Standard Time)

As far as I understood, you wanted to have $includeRelationships in getRelationships() and ability to return a Closure in RELATIONSHIP_DATA accompanied with RELATIONSHIP_HAS_DATA that can prevent the closure ever been executed. Can you please confirm the solution I've shown solves your migration problem?
I would appreciate if we can agree it is a working solution and move the next topic what should and should not be included in the main code base.

Christopher Gammie · Answer 8 · Mon May 06 2019 18:32:20 GMT+0800 (China Standard Time)

Sorry to hear the news from your temporary message, and that is of course totally understandable and right that you'll have limited time on this package.

I'll close this issue for now.

neomerx · Answer 9 · Thu May 09 2019 04:42:43 GMT+0800 (China Standard Time)

@lindyhopchris Shall we continue the discussion? Do I understand it correctly, the solution does solve your problem though you think more code should be moved from extention to the main code base?

Christopher Gammie · Answer 10 · Fri May 10 2019 16:56:48 GMT+0800 (China Standard Time)

The solution is not ideal and we would not use it for a number of reasons.

It's injecting a runtime dependency (SchemaFields) into something that we resolve out of a service container (the schemas). We treat the schemas as a service because it allows schemas to be injected with other services via constructor dependency injection.

The SchemaFields cannot be a service because we don't know where to create them from... JSON API encoding might happen outside of a HTTP request, for example when broadcasting JSON API payloads over services such as Pusher. That's just one example of many for why we can't have the schema fields injected into the schemas in the way the solution proposes.

What the SchemaFields represents is something that the encoder knows... i.e. the fields and include paths it is being asked to encode; i.e. it's a runtime dependency of the encoder that needs to be passed around to the services (schemas) that the encoder is using.

We could easily write this as an extension except for one thing: there's next to no opportunity to hook into the code within the encoder and parser, because of the amount of private methods. Yes we can copy & paste the code to our own class, but we do not like (for good reasons) to do this - i.e. copy 100s of lines of codes just to add one or two changes. It means if you make any changes to the methods we've copied and pasted, we're out of sync.

That's the challenge we face. I think maybe though if you're busy, we bank this for the moment. We'll upgrade to v2 instead of v3 and then use that for a bit before deciding whether we go to v3. I think we need to write the solution that actually works for our use case and then show you what that is - as it'll demonstrate how difficult it is to add extensions to the code base. It's understandably difficult for you to write a demo extension when you cannot know the ins and outs of our use cases!

neomerx · Answer 11 · Mon Sep 23 2019 02:06:32 GMT+0800 (China Standard Time)

@lindyhopchris Hi, I'd like to return to this discussion. I've currently got 2 ideas

sending Neomerx\JsonApi\Contracts\Parser\PositionInterface as a second param to getRelationships
sending a wrapper around Neomerx\JsonApi\Contracts\Parser\PositionInterface instead of the position.

If a position (level, path, a name of the parent relationship) is available then filtering could be done as earlier.

A wrapper gives an ability to have a custom single object that implements filtering logic instead of possibly replicating it in every schema (e.g. super fast cached filters or something similar).

What do you think?

Christopher Gammie · Answer 12 · Mon Sep 23 2019 16:55:08 GMT+0800 (China Standard Time)

Ah ok, hadn't been expecting you to suggest that.

I've banked upgrading, and probably won't be doing anything around upgrading until later this year (because of other work I've got on at the mo). My plan had been to attempt to upgrade to v3 and see where I get.

Maybe it would be better for us to revisit this at the point I do the upgrade, because it's a bit difficult to comment on what approach would be required when I'm not actually working on the upgrade?

neomerx · Answer 13 · Mon Sep 23 2019 18:24:40 GMT+0800 (China Standard Time)

OK. I'm expecting you to have difficulties with accessing filters (paths, fieldsets) from schemas. Unlike position the encoder doesn't have filtering info where it invokes the schemas. The filters will be applied later in the upper stack.
Currently, this problem could be solved by creating schemas (objects or via closures) with filtering info and then sending the schemas to the encoder.
An elegant solution could be sending the filtering info to the schemas when EncoderInterface::withIncludedPaths or EncoderInterface::withFieldSets are called. Probably via SchemaContainer.
It might require changing some interfaces so the next major version will be needed. Though the changes are likely to be minimal so most of the users migrate without any changes.