crate / crate

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Home Page:https://cratedb.com/product

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query on partitioned table doesn't filter out partitions in a smart way

matriv opened this issue · comments

CrateDB version

5.7.0

CrateDB setup information

No response

Problem description

Query on partitioned table doesn't filter out partitions in a smart way if the source of the partitioned generated column is used

Steps to Reproduce

CREATE TABLE IF NOT EXISTS "doc"."weather_data_partitioned"(
   "timestamp" TIMESTAMP WITHOUT TIME ZONE,
   "location" TEXT,
   "temperature" DOUBLE PRECISION,
   "humidity" DOUBLE PRECISION,
   "wind_speed" DOUBLE PRECISION,
   "ts_month" TIMESTAMP GENERATED ALWAYS AS date_trunc('month', "timestamp")
) CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("ts_month");
COPY weather_data_partitioned
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_weather.csv.gz'
WITH (format='csv', compression='gzip', empty_string_as_null=true);

The following query doesn't filter out partitions (hits all of them)

EXPLAIN ANALYZE SELECT
       ts_month,
       AVG("temperature") AS "avg_temperature",
       AVG("humidity") AS "avg_humidity",
       AVG("wind_speed") AS "avg_wind_speed"
    FROM "doc"."weather_data_partitioned"
    WHERE "timestamp" >= '2023-01-01' AND "timestamp" < '2023-03-01'
    GROUP BY "ts_month"
    ORDER BY "ts_month" ;

where as:

EXPLAIN ANALYZE SELECT
       ts_month,
       AVG("temperature") AS "avg_temperature",
       AVG("humidity") AS "avg_humidity",
       AVG("wind_speed") AS "avg_wind_speed"
    FROM "doc"."weather_data_partitioned"
    WHERE "ts_month" >= '2023-01-01' AND "ts_month" < '2023-03-01'
    GROUP BY "ts_month"
    ORDER BY "ts_month" ;

filters the partitions correctly.

Actual Result

Partitions are not filtered out if timestamp column is used in the WHERE clause.

Expected Result

Automatically understand that timestamp is the source column for the generated expression that constructs the partitioned by ts_month column and use it to filter out the partitions

For a simpler table:

cr> create table t_p(a int, b generated always as (a % 10)) partitioned by (b);
CREATE OK, 1 row affected (0.048 sec)
cr> insert into t_p(a) select * from generate_series(1, 20, 1);
INSERT OK, 20 rows affected (3.329 sec)
cr> refresh table t_p;
REFRESH OK, 10 rows affected (0.001 sec)
cr> explain analyze select * from t_p where a = 11;

the mechanism works correctly, and this is done in GeneratedColumnExpander, so the query becomes:
((a = 11) AND (b = (11 % 10))) which is used to filter out partitions.

For the timestamp case though there is an implicit _cast function introduced, for which the signature doesn't support the Feature.COMPARISON_REPLACEMENT, so the GeneratedColumnExpander exits here: https://github.com/crate/crate/blob/master/server/src/main/java/io/crate/analyze/GeneratedColumnExpander.java#L185

Is this a regression? If not, it's not a bug, but a potential performance improvement

Based on #14250, I tested on 5.3.2, 5.6.1 and current nightly. It worked with a modified create table statement, as there has been TIMESTAMP WITHOUT TIMEZONE vs. TIMESTAMP in the column definitions for timestamp vs. ts_month (see below). Omitting the datatype on the generated column leads to the expected effect, that partitions are selected. It touches three partitions (maybe because we put the 1st March, but no docs are fetched from this partition).

Modified CREATE TABLE statement (omitted data type on generated column - using TIMESTAMP WITHOUT TIMEZONE in the column definition of ts_month still leads to the scan of all partitions):

CREATE TABLE IF NOT EXISTS "doc"."weather_data_partitioned"(
   "timestamp" TIMESTAMP WITHOUT TIME ZONE,
   "location" TEXT,
   "temperature" DOUBLE PRECISION,
   "humidity" DOUBLE PRECISION,
   "wind_speed" DOUBLE PRECISION,
   "ts_month"  GENERATED ALWAYS AS date_trunc('month', "timestamp")
) CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("ts_month");

EXPLAIN ANALYZE of the query filtered by WHERE "timestamp" >= '2023-01-01' AND "timestamp" < '2023-03-01':

image

Question
Based on the outline of @matriv above, an implicit _cast kicks in for the timestamp / weather_data example. Maybe omitting the data type completely is the solution to this issue and we should mention it in the docs?

Thx for checking it further @ckurze!
@mfussenegger, yep it's not a bug, as I missed this issue with the datatype which adds the implicit cast, and I thought that the mechanism to filter out the irrelevant partitions was broken in general.

So the issue is the type definition in the table schema ? o_O

Is there a difference between TIMESTAMP and TIMESTAMPTZ ?

Issue seems to exist (implicit cast) for any of the 2 datatypes (WITH or WITHOUT TIME ZONE)
if the type (even if it's the same) is also defined on the generated column (which is the partitioned by column).

Workaround is to just omit the data type for the generated column that is used for partitioning, but the implicit cast should be fixed. I see it moved into the candidates for 5.8. Thank you :)

commented

Thanks for catching!

Has been implemented in #15920 and will be available in CrateDB 5.8