pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/

Home Page:https://pingcap.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TopN statistics are collected unnecessarily for non-skewed values on small tables

terry1purcell opened this issue · comments

commented

Enhancement

Current ANALYZE will collect by default the top 500 values for an interesting column (indexed column or predicate column). For tables that are sampled, then there is pruning logic to remove statistics that are not skewed. For tables that are NOT sampled (smaller tables) - the pruning logic is not invoked.

Below shows an example where values are collected with a count of 1 - which are not skewed.

tidb> show stats_topn;
+---------+------------+----------------+-------------+----------+------------+-------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Value      | Count |
+---------+------------+----------------+-------------+----------+------------+-------+
| test    | t2         |                | a           |        0 | 73         |    16 |
| test    | t2         |                | a           |        0 | 74         |    16 |
| test    | t2         |                | a           |        0 | 75         |    16 |
| test    | t2         |                | a           |        0 | 76         |    16 |
......
| test    | t2         |                | a           |        0 | 101        |     1 |
| test    | t2         |                | a           |        0 | 102        |     1 |
| test    | t2         |                | a           |        0 | 103        |     1 |
| test    | t2         |                | a           |        0 | 104        |     1 |
......
commented

Do we already support the pruning TopN logic in small table (sampled table) ?