woylie / flop

Filtering, ordering and pagination for Ecto

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Join ruins pagination

sveredyuk opened this issue · comments

Summary

I have issue with pagination when join/preload additional association

Steps to reproduce

I have project schema with Flop

@derive {
    Flop.Schema,
    filterable: [
      :title,
      :number,
      :status,
      :type_id,
      :priority_id,
      :current_stage_id,
      :notes,
      :start_at_date,
      :finish_at_date,
      :inserted_at_date,
      :manager_id,
      :member_account_id
    ],
    sortable: [
      :number,
      :title,
      :target_budget_amount,
      :priority_order,
      :inserted_at,
      :start_at,
      :finish_at
    ],
    default_order: %{order_by: [:number], order_directions: [:desc]},
    pagination_types: [:page, :first],
    adapter_opts: [
      join_fields: [
        member_account_id: [
          binding: :assigned_members,
          field: :account_id,
          ecto_type: :binary_id
        ],
        priority_order: [
          binding: :priority,
          field: :order,
          ecto_type: :biginit
        ]
      ]
    ]
  }

i need to filter by assocated member_account_id and sort by priority_order

my query is

#Ecto.Query<from p0 in HQ.CRM.Project,
 left_join: a1 in assoc(p0, :assigned_members), as: :assigned_members,
 left_join: t2 in HQ.Organization.Taxon, as: :priority,
 on: t2.id == p0.priority_id,
 preload: [{:assigned_members, :account}, :manager, :priority],
 preload: [assigned_members: a1, priority: t2]>

flop params:

page: 1,
page_size: 20,

but response weired

Expected behaviour

It reutrns 20 items per page

Actual behaviour

It returns 5-7 items per page depends on joined assoc

Elixir/Erlang version

1.16

Flop and Ecto versions

ecto 3.11.1 (Hex package) (mix)

  • ecto_sql 3.11.1 (Hex package) (mix)
    locked at 3.11.1 (ecto_sql) ce14063a
  • flop 0.25.0 (Hex package) (mix)
    locked at 0.25.0 (flop) 7e4b01b7

Additional context

I know that it might be related to Ecto limit/offset with joined association, but how to deal with in Flop?

If you join on a one-to-many or many-to-many relationship, your database will return one row per relationship, so in your case, it will return one row for each project/member association. You can see what is actually returned by the database by removing the preloads and adding a select that selects the project, member, and priority in a tuple or map. And that result is what the DB applies the limit on.

The preload query function just takes the query result (which would have 20 rows, with projects appearing multiple times because of their associations) after it is returned from the DB and nests the associations (after which each project would only appear once). So if you your query returns a project that has three members, before the preload, you have three rows (project_a/member_a, project_a/member_b, project_a/member_c), and then the preload nests the members, which leaves you with a single row (project_a{members: [member_a, member_b, member_c}). And that's how you end up with pages that have fewer items than the requested page size.

You have to ensure that your database query only returns one row per project. So this is really about SQL, and not about Ecto or Flop. Some of the things that can be useful depending on the context:

  • lateral joins
  • window queries
  • selecting from a sub query
  • selecting a list of IDs using ARRAY_AGG and select the resources in a separate query

Ok. How can I combine pagination and filtering/sorting by the nested association in Flop? I always need to join the binding reference for these, but it leads to duplicates, so I preload, but it hurts pagination.

Flop only requires a named binding to be able to filter on that. You can join on a sub query that doesn't return duplicates, e. g. by using a group or distinct clause in it. Again, this is an SQL question, not a Flop question.

I 100% agree that this is SQL question, I just wonder how to resolve this issue without terrible workarounds. Cheat sheet example requires join_fields for filtering through associated schema, and I am wondering if its possible to rewrite it to case where limit/offset pagination will be respected.

In the end, Flop only adds where clauses to an existing query. So you have to write the query the same way you would do if the filter conditions were hard-coded. No workarounds, just plain SQL query design. I would probably first try to join on a sub query and ensure that the sub query doesn't return more than one row per project, e.g. by adding a group by or distinct clause, as already mentioned, and I also mentioned multiple other ways to go about this. If you need more detailed help with Ecto query design or SQL, I'd suggest to ask in the elixir forum or on other channels focused on SQL, where more people will be able to see your question.

I wounder just if it is possible to paginate and use adapter_opts/join_fields bindings. As I have to assign bindings with left_join but subquery does not resolve this as binding expecting to be part of the main query

I tried to do:

 project_ids =
        from(t in Project)
        |> join(:left, [p], a in assoc(p, :assigned_members), as: :assigned_members)
        |> join(:left, [p], a in assoc(p, :priority), as: :priority)
        |> select([p], p.id)

      from(
        p in Project,
        where: p.id in subquery(project_ids)
      )
      |> Flop.validate_and_run(params, for: Project)

it produces result without duplicates but I can not use sorting and filtering from adapter_opts as it's not part of the main query

adapter_opts: [
      join_fields: [
        member_account_id: [
          binding: :assigned_members,
          field: :account_id,
          ecto_type: :binary_id
        ],
        priority_order: [
          binding: :priority,
          field: :order,
          ecto_type: :biginit
        ]
      ]
    ]

You can make columns from a sub query accessible in the parent query, for example:

members_query = 
  from(
    m in AssignedMember,
    where: parent_as(:projects).id == m.project_id,
    select: %{account_id: m.account_id}
  )

from(
  p in Project,
  as: :projects,
  left_lateral_join: m in subquery(members_query),
  as: :assigned_members
)
|> Flop.validate_and_run(params, for: Project)

Written that way, Flop can access the :account_id field on the :assigned_members binding. The example above still returns duplicates, so you'll need to add something like a group by or distinct clause to the subquery.

members_query = 
  from(
    m in AssignedMember,
    where: parent_as(:projects).id == m.project_id,
    select: %{account_id: min(m.account_id)},
    group_by: m.project_id
  )

Or you can use a left join on the association as before and add a group by clause on the parent query. The best approach depends on the nature of the data you're working with and how you plan to use the results. With the group by in the sub query as used here, you will need to load the members in a separate query.

I'm going to close this now, since this is not the place to discuss query design.

I resolved this with 2 flops:

# only filter without order
sub_flop =
  params
 |> Flop.validate!(for: Project)
 |> Flop.reset_order()
 # May i add Flop.reset_pagination as a new helper?
 |> Map.delete(:page)
 |> Map.delete(:page_size)
 |> Map.delete(:limit)
 |> Map.delete(:offset),

# Now only order and paginate without filters
main_flop =
 params
 |> Flop.validate!(for: Project)
 |> Flop.reset_filters()

project_ids =
from(t in Project)
|> add_assigned_members()
|> Flop.query(sub_flop, for: Project, extra_opts: [ctx: ctx])
|> distinct([p], p.id)
|> select([p], p.id)

from(
  p in Project,
  where: p.id in subquery(project_ids)
)
|> preload(^preload)
|> Flop.validate_and_run(main_flop, for: Project)

Tricky, but works fine, and no duplicates 🎉