comma-csv / comma

Comma is a small CSV (ie. comma separated values) generation extension for Ruby objects, that lets you seamlessly define a CSV output format via a small DSL

Home Page:https://github.com/comma-csv/comma

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why does sorting trigger a warning?

DannyBen opened this issue · comments

Hi,

I just noticed this warning in my logs:

#to_comma is being used on a relation with limit or order clauses. Falling back to iterating with :each. This can cause performance issues.

I am using Comma in Rails, and my controller obtains a list that is not limited, but is sorted.

My questions are:

  1. Why does sorting trigger a warning? I mean, my guess is that in most circumstances, one would want to sort the relation before generating a CSV, otherwise we just get some random database sorting.

  2. If using limit triggers a warning as well, how come all the examples on the wiki page use limit? And same as sorting, why is this considered a warning at all? It is also a desired use case.

I cannot answer much unless I see actual call and SQL query that this call makes.

Why does sorting trigger a warning? I mean, my guess is that in most circumstances, one would want to sort the relation before generating a CSV, otherwise we just get some random database sorting.

Please see #92.

If using limit triggers a warning as well, how come all the examples on the wiki page use limit? And same as sorting, why is this considered a warning at all? It is also a desired use case.

The limit has been there since 3e74413 I guess that it tries to show that you can chain to_comma method.

So, if someone wants to get a sorted or limited CSV - what would be the way to do so, without "causing performance issues"?

I cannot say much unless I see actual call and SQL query. If you have less than 1,000 rows, you can ignore warning.

Alright, I was trying to understand the principle without asking you to review my code :)

I mean, the warning clearly states - if I am using limit or sort, I should be warned about performance.
And I know I am using sort, since I want my CSV to be sorted, so it naturally raises the question: How to get a sorted CSV without performance penalty.

But, as you requested, I am posting a simplified version of the relevant controller code and the STDOUT of the server when the CSV request is made, with SQLs.

Controller

# GET /checkpoints
def index
  respond_to do |format|
    format.csv { render :csv => base_list }
    format.html { ... }
  end
end

def base_list
  Checkpoint.search(params[:search]).order(created_at: :desc).includes(:user, :batch)
end

Server STDOUT (includes SQLs)

User Load (2.6ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT $2  [["id", 385153371], ["LIMIT", 1]]

#to_comma is being used on a relation with limit or order clauses. Falling back
to iterating with :each. This can cause performance issues.

Checkpoint Exists (2.9ms)  SELECT  1 AS one FROM "checkpoints" LIMIT $1  [["LIMIT", 1]]
Checkpoint Load (2.8ms)  SELECT  "checkpoints".* FROM "checkpoints" ORDER BY "checkpoints"."created_at" DESC LIMIT $1  [["LIMIT", 1]]
Batch Load (3.9ms)  SELECT "batches".* FROM "batches" WHERE "batches"."id" = $1  [["id", 658656312]]
Checkpoint Load (3.0ms)  SELECT "checkpoints".* FROM "checkpoints" ORDER BY "checkpoints"."created_at" DESC
User Load (4.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 385153371]]
Batch Load (3.2ms)  SELECT "batches".* FROM "batches" WHERE "batches"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8)  [["id", 658656312], ["id", 658656311], ["id",658656310], ["id", 658656309], ["id", 658656306], ["id", 658656305], ["id", 658656304], ["id", 658656303]]

Thank you for showing your code.

So, if someone wants to get a sorted or limited CSV - what would be the way to do so, without "causing performance issues"?

You can ignore warning if you have less than 1,000 rows. By default, #find_each fetches 1,000 rows in a batch. If you have less rows, there would be no difference.

You can ignore warning if you have less than 1,000 rows

And for larger tables? Can't get a sorted CSV without suffering performance issues? Thats a big limitation isn't it?

Chiming in here, the description of #92 stated that

This can cause unexpected behaviour when the scope it's being run on has a limit or order clause. Neither of those work with #find_each.

While I think the statement is true for order clause, it's not the case for limit.

According to the rails find_each doc

Limits are honored, ...

So I think it's still safe to use find_each in the case where arel.ast.limit is true`. If so, I can open a PR for this. Let me know what you think! @eitoball