python ORM support for composite primary key tuple filtering - `WHERE (a,b) IN ( (1,2), (1, 3), (2, 4) )`

Question

python ORM support for composite primary key tuple filtering - `WHERE (a,b) IN ( (1,2), (1, 3), (2, 4) )`

johnny-smitherson opened this issue 6 months ago · comments

In CQL, when the table Clustering Key is made of many columns, it's possible to make batch queries like this:

CREATE TABLE example_table (
    a int, b int, c int, d int
    PRIMARY KEY ((a, b), c, d)
);

x> SELECT *  FROM example_table WHERE a=1 AND b=2 AND (c,d) IN ( (1,2), (1, 3), (2, 4) );

With a default cardinality limit of 100 rows, this type of query speeds up batch processing by reducing round trip time 100-fold.

This is not possible with the cassandra.cqlengine.query ORM builder - as there is no way to specify .filter(XXX__in=((1,2),(3,4))) queries on a list of tuples of the PK (or from indexes).

Is this a desirable feature to have? Or should I just keep this functionality in raw CQL?

Implementation ideas:

add special pk=(1,2,3,4) and ck__in=[(1,2), (3,4)] filter functions that assume the tuples given are prefixes of the composite primary key in the correct order
or, add special syntax with explicit column names: .filter(a=1, b=2, c__d__in=((1,2), (4,5))

SO discussion on WHERE with column tuples: https://stackoverflow.com/questions/62047786/cassandra-where-clause-as-a-tuple/62050254#62050254

Israel Fruchter · Answer 1 · Tue Jan 16 2024 17:15:22 GMT+0800 (China Standard Time)

Hi @johnny-smitherson,

I don't know if I would use the name of columns in the filter arguments, but might think of extending the where clause:

.filter(a=1, b=2, _and = '(c,d) IN ( (1,2), (1, 3), (2, 4)')
# or
.filter(a=1, b=2, _extra_where = 'AND (c,d) IN ( (1,2), (1, 3), (2, 4)')

but anyhow since this fork doesn't have any scylla specific modifications in the cqlengine, I would recommend
suggesting it to cassandra upstream of this driver, in:
https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues

if it would be agree and accepted there, it would eventfully gonna land here as well.

Karol Baryła · Answer 2 · Wed Jun 19 2024 01:01:57 GMT+0800 (China Standard Time)

Is there an issue opened in upstream for this?

I don't think we are going to implement any new features in cqlengine in our fork.

Israel Fruchter · Answer 3 · Wed Jun 19 2024 01:10:48 GMT+0800 (China Standard Time)

A thought, maybe we should even title it unsupported/unmaintained from a certain release ?

Karol Baryła · Answer 4 · Wed Jun 19 2024 01:36:25 GMT+0800 (China Standard Time)

There are integration tests for it and we do run them in CI - so it's not completely forgotten.
I know there are clients that use it so declaring it unsupported without replacement might not be a good idea.