BeanieODM / beanie

Asynchronous Python ODM for MongoDB

Home Page:http://beanie-odm.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Order of elements in list[Link] is not preserved

moonrail opened this issue · comments

Describe the bug
As #414 and #527 were closed without any interaction by a stale bot (yay...) here another attempt.

When using DBRefs/Links, the ordering is stable when using MongoDB directly or any of its lower level libraries.

This is however not always the case with beanie.

Upon insert/update the order is preserved, as defined.

Upon link-fetching via e.g. fetch_links or fetch_all_links, it is not.

A possible solution can be found here: #414 (comment)
This just copies all Ids to a list with the original order and sorts the fetched documents later based on this order.

To Reproduce
Examples can be found in both older issues.

Expected behavior
Using fetch_all_links and fetch_links keeps the order of items as defined as DBRefs in the database.

Additional context

Note that in addition to not preserving the order of elements, fetch_links may even change the length of the returned list by removing duplicate elements. In summary, fetch_links behaves like an $in query, not like a list fetch.

The offending code is here, in the LinkTypes.LIST branch of the construct_query function in beanie.odm.utils.find:

"$lookup": {
"from": link_info.document_class.get_motor_collection().name, # type: ignore
"localField": f"{link_info.lookup_field_name}.$id",
"foreignField": "_id",
"as": link_info.field_name,
}

I see this as more of a correctness issue than a convenience issue. As a mongodb datastructure, arrays can contain duplicates and their order is part of their semantics. Users will expect fetching an array to fetch elements in order, including duplicates. I believe there are ways to dereference an array of ids efficiently, but even if it turns out that this is a fundamentally slow operation, then I think the solution would be to warn users of the conditions under which the operation is slow rather than to break array semantics.

As a proof of concept, here is an example query that does an in-order list fetch that could be modified to replace the existing $lookup stage in construct_query (playground link: https://mongoplayground.net/p/t7TyJK1oOEI)

db.orders.aggregate([
  {
    $unwind: "$items"
  },
  {
    $lookup: {
      from: "inventory",
      localField: "items.$id",
      foreignField: "_id",
      as: "items"
    }
  },
  {
    $group: {
      _id: "$_id",
      items: {
        $push: {
          $first: "$items"
        }
      },
      __oldRoot: {
        $last: "$$ROOT"
      }
    }
  },
  {
    $replaceRoot: {
      newRoot: {
        $mergeObjects: [
          "$__oldRoot",
          {
            items: "$items"
          }
        ]
      }
    }
  }
])

The idea is to unwind the array, perform an efficient lookup on the id, group the results into an object containing the fetched array and the old document, then merge the old document with a new document containing the fetched items.

@roman-right How do you feel about the solution proposed here? Have I missed anything or do you have other concerns that I could help address? Before submitting a PR I would want to have you on-board with the general plan.

This issue is stale because it has been open 30 days with no activity.