sraoss / pgsql-ivm

IVM (Incremental View Maintenance) development for PostgreSQL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

numeric type AVG() returns incorrect result.

nuko-yokohama opened this issue · comments

If numeric AVG () is specified in the definition of "CREATE INCREMENTAL MATERIALIZED View" and then an INSERT statement is executed, AVG() returns an incorrect result.

The same phenomenon occurs in the real type and double precision type.
It does not occur in the case of MAX(), MIN(), or SUM().

Execution example.

$ psql -U postgres testdb -e -f agg_numeric.sql
DROP TABLE IF EXISTS table_a CASCADE;
psql:agg_numeric.sql:1: NOTICE:  drop cascades to 3 other objects
DETAIL:  drop cascades to view table_a_v
drop cascades to materialized view table_a_mv
drop cascades to materialized view table_a_ivm
DROP TABLE
CREATE TABLE table_a (id int primary key, data numeric, dummy text);
CREATE TABLE
INSERT INTO table_a VALUES (generate_series(1, 100000), (random() * 100)::numeric, repeat(' ', 80));
INSERT 0 100000
CREATE VIEW table_a_v AS SELECT COUNT(id), MAX(data), MIN(data), SUM(data), AVG(data) FROM table_a;
CREATE VIEW
CREATE MATERIALIZED VIEW table_a_mv AS SELECT COUNT(id), MAX(data), MIN(data), SUM(data), AVG(data) FROM table_a;
SELECT 1
CREATE INCREMENTAL MATERIALIZED VIEW table_a_ivm AS SELECT COUNT(id), MAX(data), MIN(data), SUM(data), AVG(data) FROM table_a;
SELECT 1
                  List of relations
 Schema |    Name     |       Type        |  Owner
--------+-------------+-------------------+----------
 public | table_a     | table             | postgres
 public | table_a_ivm | materialized view | postgres
 public | table_a_mv  | materialized view | postgres
 public | table_a_v   | view              | postgres
(4 rows)

SELECT 'v' as "view type", * FROM table_a_v
UNION
SELECT 'mv' as "view type", * FROM table_a_mv
UNION
SELECT 'imv' as "view type", * FROM table_a_ivm
;
 view type | count  |       max        |         min          |            sum             |          avg
-----------+--------+------------------+----------------------+----------------------------+-----------------------
 v         | 100000 | 99.9999617116377 | 0.000353251331475235 | 4997894.744345788204685825 | 49.978947443457882047
 mv        | 100000 | 99.9999617116377 | 0.000353251331475235 | 4997894.744345788204685825 | 49.978947443457882047
 imv       | 100000 | 99.9999617116377 | 0.000353251331475235 | 4997894.744345788204685825 | 49.978947443457882047
(3 rows)

INSERT INTO table_a VALUES (generate_series(100001, 100010), (random() * 100)::numeric, repeat(' ', 80));
INSERT 0 10
REFRESH MATERIALIZED VIEW table_a_mv;
REFRESH MATERIALIZED VIEW
                  List of relations
 Schema |    Name     |       Type        |  Owner
--------+-------------+-------------------+----------
 public | table_a     | table             | postgres
 public | table_a_ivm | materialized view | postgres
 public | table_a_mv  | materialized view | postgres
 public | table_a_v   | view              | postgres
(4 rows)

SELECT 'v' as "view type", * FROM table_a_v
UNION
SELECT 'mv' as "view type", * FROM table_a_mv
UNION
SELECT 'imv' as "view type", * FROM table_a_ivm
;
 view type | count  |       max        |         min          |            sum             |          avg
-----------+--------+------------------+----------------------+----------------------------+-----------------------
 imv       | 100010 | 99.9999617116377 | 0.000353251331475235 | 4998448.819947709425954825 |   -624560507.26147385
 mv        | 100010 | 99.9999617116377 | 0.000353251331475235 | 4998448.819947709425954825 | 49.979490250452049055
 v         | 100010 | 99.9999617116377 | 0.000353251331475235 | 4998448.819947709425954825 | 49.979490250452049055
(3 rows)

$

Thank you for reporting the issue. I confirmed this and I am now investigating.

This issue is fixed and merged. Thank you!

@nuko-yokohama Could you please confirm this and close this issue if you find no problem?

I find that bug is not fixed when I tested current branch.
The bad case is for example ivm using avg() with datatype of 'real' or 'double precision'.

test=# SELECT COUNT(id_real), MAX(id_real),MIN(id_real),SUM(id_real),AVG(id_real) FROM mv_base_datatype
test-# UNION ALL
test-# SELECT * FROM mv_ivm_real;
count | max | min | sum | avg
-------+-----+-----+-----+--------------------
20 | 2 | 0.1 | 21 | 1.0500000011175872
20 | 2 | 0.1 | 21 | 1.05
(2 rows)

test=#
test=# SELECT COUNT(id_double_precision), MAX(id_double_precision),MIN(id_double_precision),SUM(id_double_precision),AVG(id_double_precision) FROM mv_base_datatype
test-# UNION ALL
test-# SELECT * FROM mv_ivm_double_precision;
count | max | min | sum | avg
-------+-----+-----+--------------------+--------------------
20 | 2 | 0.1 | 20.999999999999996 | 1.0499999999999998
20 | 2 | 0.1 | 21 | 1.05

Please see the details in the attached log file.

test_ivm_aggregate_function_with_some_datatye.txt

Hmm... this is due to the cancellation of significant digits of float8.

=# create table d (v double precision);
CREATE TABLE
=# insert into d select 0.1 * i from generate_series(1,20) i;
INSERT 0 20

=# WITH
-#  s1 AS (select sum(v) from (select * from d offset 0 limit 10) v),
-#  s2 AS (select sum(v) from (select * from d offset 10 limit 10) v)
-# select sum(v) from d
-#  union all
-# select (select sum from s1) + (select sum from s2);
        sum         
--------------------
 20.999999999999996
                 21
(2 rows)

For safety, it might be better not to support avg(real) and avg (double precision), although there is no problem about numeric. A related problem occurs when using moving aggregates as noted in the following documentation. Similar to this, IVM for aggregate on floating-point type would be unsafe, especially when tuple deletion not only insertion is involved.

https://www.postgresql.org/docs/12/xaggr.html#XAGGR-MOVING-AGGREGATES

As to avg(real), the error would be due to type cast from float4 to float8 because the type of transition state type used in avg(float4) implementation is float8.

=# create table d (v double precision);
CREATE TABLE
=# insert into d select 0.1 * i from generate_series(1,20) i;
INSERT 0 20
=# select sum(v::double precision)/20 from r;
      ?column?      
--------------------
 1.0500000011175872
(1 row)

I confirmed that the numeric type problem was fixed.
Thank you for the response.

@nuko-yokohama Thank you for your confirmation.

I leave this issue open for discussion on float4/float8 support.

There would be no discussion. I'll close it for now.