cyjake / leoric

👑 JavaScript ORM for MySQL, PostgreSQL, and SQLite.

Home Page:https://leoric.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

premature subquery optimization causes join query return insufficient data

cyjake opened this issue · comments

Given data:

await Post.bulkCreate([
  { id: 2, title: 'Archbishop Lazarus' },
  { id: 3, title: 'Archangel Tyrael' },
]);

await Comment.bulkCreate([
  { articleId: 2, content: 'foo' },
  { articleId: 2, content: 'bar' },
  { articleId: 3, content: 'baz' },
]);

Given query:

await Post.include('comments').order('posts.id').where({ 
  'posts.title': { $like: 'Arch%' },
  'comments.content': 'baz',
});

Following SQL is generated:

 SELECT `posts`.*, `comments`.* FROM (SELECT * FROM `articles` WHERE `title` LIKE 'Arch%' AND `gmt_deleted` IS NULL LIMIT 1) AS `posts` LEFT JOIN `comments` AS `comments` ON `posts`.`id` = `comments`.`article_id` AND `comments`.`gmt_deleted` IS NULL WHERE `comments`.`content` = 'baz'

The subquery SELECT * FROM articlesWHEREtitleLIKE 'Arch%' ANDgmt_deleted IS NULL LIMIT 1 rules out Post { id: 3, title: 'Archangel Tyrael' } and keeps the wrong one Post { id: 2, title: 'Archbishop Lazarus' }

The subquery was intended to hoist the limit/offset to the main table to filter data as much as possible. It surely is a premature optimization that needs to be removed.

Turns out the original implementation is a little complicated than that. Take following two queries for example:

await Post.include('comments').order('comments.content asc').first;
// => Post { title, comments: [...Comment] }

There might be more than one comments and it is expected that all of them should be retrieved, which is so called eager retrieval in Active Record of Ruby on Rails

await Post.include('comments').where({
  'posts.title': { $like: 'Arch%' },
  'comments.content': 'baz',
}).first;
// => Post { title, comments: [] }

There are extra filtering on the query result, which might cause the posts returned inaccurate if the offset/limit is hoisted in subquery. Without the offset/limit hoisted though, the posts.length won't be accurate even if there still are more records in the table, because the offset/limit is now applied on the larger query.