scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra

Home Page:https://python-driver.docs.scylladb.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python ORM support for composite primary key tuple filtering - `WHERE (a,b) IN ( (1,2), (1, 3), (2, 4) )`

johnny-smitherson opened this issue · comments

In CQL, when the table Clustering Key is made of many columns, it's possible to make batch queries like this:

CREATE TABLE example_table (
    a int, b int, c int, d int
    PRIMARY KEY ((a, b), c, d)
);

x> SELECT *  FROM example_table WHERE a=1 AND b=2 AND (c,d) IN ( (1,2), (1, 3), (2, 4) );

With a default cardinality limit of 100 rows, this type of query speeds up batch processing by reducing round trip time 100-fold.

This is not possible with the cassandra.cqlengine.query ORM builder - as there is no way to specify .filter(XXX__in=((1,2),(3,4))) queries on a list of tuples of the PK (or from indexes).

Is this a desirable feature to have? Or should I just keep this functionality in raw CQL?

Implementation ideas:

  • add special pk=(1,2,3,4) and ck__in=[(1,2), (3,4)] filter functions that assume the tuples given are prefixes of the composite primary key in the correct order
  • or, add special syntax with explicit column names: .filter(a=1, b=2, c__d__in=((1,2), (4,5))

SO discussion on WHERE with column tuples: https://stackoverflow.com/questions/62047786/cassandra-where-clause-as-a-tuple/62050254#62050254

Hi @johnny-smitherson,

I don't know if I would use the name of columns in the filter arguments, but might think of extending the where clause:

.filter(a=1, b=2, _and = '(c,d) IN ( (1,2), (1, 3), (2, 4)')
# or
.filter(a=1, b=2, _extra_where = 'AND (c,d) IN ( (1,2), (1, 3), (2, 4)')

but anyhow since this fork doesn't have any scylla specific modifications in the cqlengine, I would recommend
suggesting it to cassandra upstream of this driver, in:
https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues

if it would be agree and accepted there, it would eventfully gonna land here as well.

Is there an issue opened in upstream for this?

I don't think we are going to implement any new features in cqlengine in our fork.

A thought, maybe we should even title it unsupported/unmaintained from a certain release ?

There are integration tests for it and we do run them in CI - so it's not completely forgotten.
I know there are clients that use it so declaring it unsupported without replacement might not be a good idea.