Slow performance on job_info method

Question

Slow performance on job_info method

avkhozov opened this issue 8 years ago · comments

Minion version: 6.0
Perl version: any
Operating system: any

$minion->job($id) for Pg is slow. Even if there is no dependency in jobs.

Pg explain analyze output info for query (https://github.com/kraih/minion/blob/978612cd3fc33ec9f66c4caa8bdc9c308d27d200/lib/Minion/Backend/Pg.pm#L54):

# explain analyze select id, args, attempts, array(select id from minion_jobs where j.id = any(parents)) as children, extract(epoch from created) as created, extract(epoch from delayed) as delayed, extract(epoch from finished) as finished, parents, priority, queue, result, extract(epoch from retried) as retried, retries, extract(epoch from started) as started, state, task, worker from minion_jobs as j where id = 164360;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using minion_jobs_pkey on minion_jobs j  (cost=0.42..26552.96 rows=1 width=1039) (actual time=96.433..96.438 rows=1 loops=1)
   Index Cond: (id = 164360)
   SubPlan 1
     ->  Seq Scan on minion_jobs  (cost=0.00..26544.51 rows=8002 width=8) (actual time=96.359..96.359 rows=0 loops=1)
           Filter: (j.id = ANY (parents))
           Rows Removed by Filter: 166400
 Planning time: 0.162 ms
 Execution time: 96.493 ms
(8 rows)

You can see, than subquery for children array use sequence scan on minion_jobs table.

What about gin index on parent field and contains operator for postgres arrays (https://www.postgresql.org/docs/current/static/functions-array.html)?

create index on minion_jobs using gin (parents);

And new explain output for new query:

# explain analyze select id, args, attempts, array(select id from minion_jobs where array[164360]::bigint[] <@ parents) as children, extract(epoch from created) as created, extract(epoch from delayed) as delayed, extract(epoch from finished) as finished, parents, priority, queue, result, extract(epoch from retried) as retried, retries, extract(epoch from started) as started, state, task, worker from minion_jobs as j where id = 164360;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using minion_jobs_pkey on minion_jobs j  (cost=2933.08..2941.11 rows=1 width=1039) (actual time=0.055..0.056 rows=1 loops=1)
   Index Cond: (id = 164360)
   InitPlan 1 (returns $0)
     ->  Bitmap Heap Scan on minion_jobs  (cost=114.45..2932.66 rows=832 width=8) (actual time=0.013..0.013 rows=0 loops=1)
           Recheck Cond: ('{164360}'::bigint[] <@ parents)
           ->  Bitmap Index Scan on minion_jobs_parents_idx  (cost=0.00..114.24 rows=832 width=0) (actual time=0.011..0.011 rows=0 loops=1)
                 Index Cond: ('{164360}'::bigint[] <@ parents)
 Planning time: 0.258 ms
 Execution time: 0.111 ms
(9 rows)

Sebastian Riedel · Answer 1 · Sun Oct 23 2016 03:05:07 GMT+0800 (China Standard Time)

Sounds like a good idea.

Sebastian Riedel · Answer 2 · Sun Oct 23 2016 18:06:27 GMT+0800 (China Standard Time)

Afraid i can't replicate your results though.

 Index Scan using minion_jobs_pkey on minion_jobs j  (cost=0.42..7523.09 rows=1 width=111) (actual time=39.652..39.653 rows=1 loops=1)
   Index Cond: (id = 100000)
   SubPlan 1
     ->  Seq Scan on minion_jobs  (cost=0.00..7514.64 rows=9923 width=8) (actual time=39.619..39.619 rows=0 loops=1)
           Filter: (j.id = ANY (parents))
           Rows Removed by Filter: 200000
 Planning time: 0.134 ms
 Execution time: 39.690 ms
(8 rows)

Even after creating the index and running this query many times, PostgreSQL insists on using a seq scan.

Andrey Khozov · Answer 3 · Sun Oct 23 2016 19:28:08 GMT+0800 (China Standard Time)

Sadly. Most likely I've been experimenting with set enable_seqscan to off; that gave this result. Don't know why postgres query planner don't use this index by default.

Sebastian Riedel · Answer 4 · Mon Oct 24 2016 03:52:18 GMT+0800 (China Standard Time)

I might have found a solution that works most of the time. Will not release it to CPAN for now though, so we can test it some more. a42fae8