TopN statistics are collected unnecessarily for non-skewed values on small tables
terry1purcell opened this issue · comments
Enhancement
Current ANALYZE will collect by default the top 500 values for an interesting column (indexed column or predicate column). For tables that are sampled, then there is pruning logic to remove statistics that are not skewed. For tables that are NOT sampled (smaller tables) - the pruning logic is not invoked.
Below shows an example where values are collected with a count of 1 - which are not skewed.
tidb> show stats_topn;
+---------+------------+----------------+-------------+----------+------------+-------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Value | Count |
+---------+------------+----------------+-------------+----------+------------+-------+
| test | t2 | | a | 0 | 73 | 16 |
| test | t2 | | a | 0 | 74 | 16 |
| test | t2 | | a | 0 | 75 | 16 |
| test | t2 | | a | 0 | 76 | 16 |
......
| test | t2 | | a | 0 | 101 | 1 |
| test | t2 | | a | 0 | 102 | 1 |
| test | t2 | | a | 0 | 103 | 1 |
| test | t2 | | a | 0 | 104 | 1 |
......
Do we already support the pruning TopN logic in small table (sampled table) ?