Query on partitioned table doesn't filter out partitions in a smart way

Question

Query on partitioned table doesn't filter out partitions in a smart way

matriv opened this issue 4 months ago · comments

Marios Trivyzas commented 4 months ago

CrateDB version

5.7.0

CrateDB setup information

No response

Problem description

Query on partitioned table doesn't filter out partitions in a smart way if the source of the partitioned generated column is used

Steps to Reproduce

CREATE TABLE IF NOT EXISTS "doc"."weather_data_partitioned"(
   "timestamp" TIMESTAMP WITHOUT TIME ZONE,
   "location" TEXT,
   "temperature" DOUBLE PRECISION,
   "humidity" DOUBLE PRECISION,
   "wind_speed" DOUBLE PRECISION,
   "ts_month" TIMESTAMP GENERATED ALWAYS AS date_trunc('month', "timestamp")
) CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("ts_month");
COPY weather_data_partitioned
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_weather.csv.gz'
WITH (format='csv', compression='gzip', empty_string_as_null=true);

The following query doesn't filter out partitions (hits all of them)

EXPLAIN ANALYZE SELECT
       ts_month,
       AVG("temperature") AS "avg_temperature",
       AVG("humidity") AS "avg_humidity",
       AVG("wind_speed") AS "avg_wind_speed"
    FROM "doc"."weather_data_partitioned"
    WHERE "timestamp" >= '2023-01-01' AND "timestamp" < '2023-03-01'
    GROUP BY "ts_month"
    ORDER BY "ts_month" ;

where as:

EXPLAIN ANALYZE SELECT
       ts_month,
       AVG("temperature") AS "avg_temperature",
       AVG("humidity") AS "avg_humidity",
       AVG("wind_speed") AS "avg_wind_speed"
    FROM "doc"."weather_data_partitioned"
    WHERE "ts_month" >= '2023-01-01' AND "ts_month" < '2023-03-01'
    GROUP BY "ts_month"
    ORDER BY "ts_month" ;

filters the partitions correctly.

Actual Result

Partitions are not filtered out if timestamp column is used in the WHERE clause.

Expected Result

Automatically understand that timestamp is the source column for the generated expression that constructs the partitioned by ts_month column and use it to filter out the partitions

Marios Trivyzas · Answer 1 · Wed Mar 06 2024 22:31:52 GMT+0800 (China Standard Time)

For a simpler table:

cr> create table t_p(a int, b generated always as (a % 10)) partitioned by (b);
CREATE OK, 1 row affected (0.048 sec)
cr> insert into t_p(a) select * from generate_series(1, 20, 1);
INSERT OK, 20 rows affected (3.329 sec)
cr> refresh table t_p;
REFRESH OK, 10 rows affected (0.001 sec)
cr> explain analyze select * from t_p where a = 11;

the mechanism works correctly, and this is done in GeneratedColumnExpander, so the query becomes:
((a = 11) AND (b = (11 % 10))) which is used to filter out partitions.

For the timestamp case though there is an implicit _cast function introduced, for which the signature doesn't support the Feature.COMPARISON_REPLACEMENT, so the GeneratedColumnExpander exits here: https://github.com/crate/crate/blob/master/server/src/main/java/io/crate/analyze/GeneratedColumnExpander.java#L185

Mathias Fußenegger · Answer 2 · Thu Mar 07 2024 01:38:12 GMT+0800 (China Standard Time)

Is this a regression? If not, it's not a bug, but a potential performance improvement

ckurze · Answer 3 · Thu Mar 07 2024 16:00:45 GMT+0800 (China Standard Time)

Based on #14250, I tested on 5.3.2, 5.6.1 and current nightly. It worked with a modified create table statement, as there has been TIMESTAMP WITHOUT TIMEZONE vs. TIMESTAMP in the column definitions for timestamp vs. ts_month (see below). Omitting the datatype on the generated column leads to the expected effect, that partitions are selected. It touches three partitions (maybe because we put the 1st March, but no docs are fetched from this partition).

Modified CREATE TABLE statement (omitted data type on generated column - using TIMESTAMP WITHOUT TIMEZONE in the column definition of ts_month still leads to the scan of all partitions):

CREATE TABLE IF NOT EXISTS "doc"."weather_data_partitioned"(
   "timestamp" TIMESTAMP WITHOUT TIME ZONE,
   "location" TEXT,
   "temperature" DOUBLE PRECISION,
   "humidity" DOUBLE PRECISION,
   "wind_speed" DOUBLE PRECISION,
   "ts_month"  GENERATED ALWAYS AS date_trunc('month', "timestamp")
) CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("ts_month");

EXPLAIN ANALYZE of the query filtered by WHERE "timestamp" >= '2023-01-01' AND "timestamp" < '2023-03-01':

Question
Based on the outline of @matriv above, an implicit _cast kicks in for the timestamp / weather_data example. Maybe omitting the data type completely is the solution to this issue and we should mention it in the docs?

Marios Trivyzas · Answer 4 · Thu Mar 07 2024 17:08:35 GMT+0800 (China Standard Time)

Thx for checking it further @ckurze!
@mfussenegger, yep it's not a bug, as I missed this issue with the datatype which adds the implicit cast, and I thought that the mechanism to filter out the irrelevant partitions was broken in general.

Georg Traar · Answer 5 · Fri Mar 08 2024 01:05:51 GMT+0800 (China Standard Time)

So the issue is the type definition in the table schema ? o_O

Is there a difference between TIMESTAMP and TIMESTAMPTZ ?

Marios Trivyzas · Answer 6 · Mon Mar 11 2024 17:33:11 GMT+0800 (China Standard Time)

Issue seems to exist (implicit cast) for any of the 2 datatypes (WITH or WITHOUT TIME ZONE)
if the type (even if it's the same) is also defined on the generated column (which is the partitioned by column).

ckurze · Answer 7 · Tue Mar 12 2024 15:03:02 GMT+0800 (China Standard Time)

Workaround is to just omit the data type for the generated column that is used for partitioning, but the implicit cast should be fixed. I see it moved into the candidates for 5.8. Thank you :)

Baur · Answer 8 · Fri Apr 26 2024 21:33:28 GMT+0800 (China Standard Time)

Thanks for catching!

Has been implemented in #15920 and will be available in CrateDB 5.8