`COUNT(DISTINCT)` aggregator is very slow.
cristianberneanu opened this issue · comments
Cristian Berneanu commented
cristian=# \timing
Timing is on.
cristian=# CREATE TABLE test_large AS (
cristian(# SELECT
cristian(# i AS id, left(md5(random()::text), 4) AS t
cristian(# FROM generate_series(1, 100000) series(i)
cristian(# );
SELECT 100000
Time: 245.086 ms
cristian=# SECURITY LABEL FOR pg_diffix ON TABLE test_large IS 'sensitive';
SECURITY LABEL
Time: 9.695 ms
cristian=# SECURITY LABEL FOR pg_diffix ON COLUMN test_large.id IS 'aid';
SECURITY LABEL
Time: 5.123 ms
cristian=# SELECT count(DISTINCT t) FROM test_large;
count
-------
51171
(1 row)
Time: 49.442 ms
cristian=# SET pg_diffix.session_access_level = 'publish_trusted';
SET
Time: 0.284 ms
cristian=# SELECT count(DISTINCT t) FROM test_large;
count
-------
51168
(1 row)
Time: 112256.866 ms (01:52.257)
Edon Gashi commented
Also ignores cancel requests.
Cristian Berneanu commented
Also ignores cancel requests.
Nice find, but better to address it in a separate issue.
Edon Gashi commented
Looks like 99% of the slowdown happens in final agg. The reference implementation afaik is quite fast. We should mirror that behavior.