tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.

Home Page:https://turso.tech/libsql

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementing `EXCEPT ALL` and `INTERSECT ALL` operations

NiklasBurggraaff opened this issue · comments

Hi all!

TL;DR: I have a working implementation of EXCEPT ALL and INTERSECT ALL operations in SQLite, would there be an interest in including these in libSQL?

I have implemented these for my Honours Project (4th year project and end of my BSc), "Native support for bag operations in SQLite". SQLite is one of the most popular DBMS's that does not have F304 (EXCEPT ALL table operator) and F305 (INTERSECT ALL table operator) which are (non-core) features defined in the ISO SQL specification. EXCEPT ALL, which is the equivalent to the mathematically defined multiset difference in multiset relational algebra, would increase the expressive power of queries.

To summarise how they are implemented, 2 temporary tables, one which is indexed by a row in the operand tables storing a rowid as the data, representing the rowid in the other table which keeps track of the count for that specific row value. For EXCEPT ALL this count is incremented for the left operand, then decremented for the right operand resulting in the number of occurrences that row should occur in the output. For INTERSECT ALL there is a counter for both the left and right operand, after which the minimum of the 2 is the number of occurrences that row should occur in the output.

For my project I am also doing analysis of the runtime behaviour of my implementation. In particular this is compared to some other DBMS's that already have these operations implemented, PostgreSQL, MySQL, and MariaDB. The (non-final) result of this shows that in SQLite these operations are $O(nlg(n))$, while in others these are $O(n)$. This is due to the overhead of the B+ tree used in SQLite, whereas other DBMS's use hash maps (PostgreSQL also has the ability to do it through sorting so would be $O(nlg(n))$).
However, compared to the EXCEPT and INTERSECT operations the increase in runtime to EXCEPT ALL and INTERSECT ALL respectively is larger than in the other DBMS's.

My implementation has not been fully tested yet, but produces the same outputs as PostgreSQL, MySQL, and MariaDB on many (simple) test cases I have generated randomly.

It would be really exciting if this is something that could be considered. I understand this is a deviation from the SQL features implemented by SQLite, but these are features implemented in most other DBMS's.
As a specific example, in the Drizzle documentation (as drizzle is used by Astro DB), SQLite is the only DBMS that does not support these operators.

Cheers :)