alcaeus / mongo-php-adapter

:link: Adapter to provide ext-mongo interface on top of mongo-php-library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

$collection->save Doesn't work on sharded collection (since 4.2)

EranUzan opened this issue · comments

Hey,

It seems that the save function for collection doesn't work with sharded collection after mongodb 4.2 for collection that are not sharded by _id (which should be most of them).

It seems that the issue is tht the save is using _id field to identify the document for the filter provided for replaceOne :

            $result = $this->collection->replaceOne(
                TypeConverter::fromLegacy(['_id' => $id]),
                TypeConverter::fromLegacy($document),
                $this->convertWriteConcernOptions($options)

It might be a good idea to add a new argument for the MongoCollection::save function to provide the filter fields (or the filter query) externally when needed by the user of the adapter library.

It might be a good idea to add a new argument for the MongoCollection::save function to provide the filter fields (or the filter query) externally when needed by the user of the adapter library.

No: the API is locked to that of the legacy MongoDB PHP driver to provide a drop-in replacement. Please note that this library provides the functionality of ext-mongo 1.6, which was released to provide compatibility with MongoDB 3.0. Looking at the source code for MongoCollection::save, we can see that the original implementation only supports updating by ID, which is what I did when providing the compatibility layer. Note that the litmus test for "is this a bug in the adapter" would be a test that passes on the verification build (there's a build that runs all tests against ext-mongo to verify test suite integrity) but fails against the adapter implementation.

My advice is to not rely on MongoCollection::save at all; there's a reason why this was excluded from the modern driver API. Instead, build your own insert/update queries and call MongoCollection::update and MongoCollection::insert as necessary.

I would also like to point out that you're using a driver that was designed to work with MongoDB 3.0 and are using it with MongoDB 4.2. I can guarantee you that you will encounter more and more problems. Please note that this library was created as a drop-in solution to be able to update to a newer PHP release without rewriting your database logic first. It is not designed to be a permanent solution. My suggestion would be to invest time into getting rid of this adapter (just make sure to negotiate with your boss about a bonus for the performance improvements you'll get from that) rather than upgrading to newer versions of MongoDB. You are still running an unsupported piece of software (the legacy MongoDB driver), it's just that you're using a very inefficient PHP port of it rather than the native implementation.

@alcaeus Thank you for the feedback.

From reading the docs:

For a replace document operation that includes upsert: true and is on a sharded collection, the filter must include an equality match on the full shard key.

How can we get the collection shard key(s)?

How can we get the collection shard key(s)?

You defined this key when you enabled sharding. You can see all indexes using the listIndexes command, but I'd have to check if the shard key is exposed differently.

@alcaeus If you need to store in app-level (such app configuration) the sharding key(s), it's coupling the app with the db.

We did not find a way to locate the shard keys from the driver, so we cannot keep on backward compatibility on the save method and replace it with replaceOne method, without pre-store the shard keys. Does it correct assumption?

I wouldn't say this is app configuration, but rather your document representation knowing what the document contains. If you identify a document by _id alone, your application need to know that. If you identify it by _id and other fields (e.g. the shard key), your application needs to know that.

so we cannot keep on backward compatibility on the save method and replace it with replaceOne method

Yes, you can: the save method never supported sharded documents in the first place, so any document that you wrote with save will always work in replaceOne with upsert. To update a sharded document with the legacy API, use the MongoCollection::update method, passing the full shard key for $criteria. This is a requirement by MongoDB, and not a design decision made in the legacy driver, the new PHP driver, or even this library (which as I said provides the same functionality as the legacy driver, including all its flaws).

To further elaborate on the above, I took another look at the update command, specifically upsert on a Sharded Collection:

If you specify upsert: true, the filter q must include an equality match on the shard key.

Note that save runs an update with upsert true, so it needs the full shard key. This requirement did not change, but was adapted in 4.4 as documents may now be missing the shard key fields.

However, a change in 4.2 that affects replacing documents (which save does in both legacy driver and this adapter) (source):

Starting in MongoDB 4.2, when replacing a document, update attempts to target a shard, first by using the query filter. If the operation cannot target a single shard by the query filter, it then attempts to target by the replacement document.

The save method always amounts to a replace, so this change affects the legacy driver, but I fail to see how this would break anything as it falls back to the original behaviour.

Since I can't see what's wrong, I encourage you to come up with a test that passes when using ext-mongo on a sharded cluster (either 4.0 or 4.2), but fails when using mongo-php-adapter (on either 4.0 or 4.2). Thanks!

@alcaeus the save method cannot be used anymore, as well as replaceOne, as both of them needs in the query the shard key(s), on sharded collection. The driver or adapter does not know the shard keys so any save or replaceOne are failed on sharded collections, as until now _id was used.

One workaround is to store the shard keys in the application layer, but this is coupling solution.

BTW, we saw this issue on MongoDB JIRA.

Hope it's clear now.

The ticket you mentioned refers to future work if the server relaxes the requirement for various commands with respect to the shard key. At this point, none of the commands this targets (updateOne, updateMany, deleteOne, deleteMany, findAndModify) saw any change in the server.

The ticket also doesn't require any driver changes, as the drivers don't know about any sharding setup or what keys are included. This affects ODMs, as these tools would normally generate queries for the user and no longer need to include the full shard key (or the _id field) once the server changes the requirement.

As mentioned before, if you noticed that this changed recently, I encourage you to show the code in question so I can investigate it. Thanks!