pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/

Home Page:https://pingcap.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The estRows is inaccurate when there is data skew.

yedushusheng opened this issue · comments

Bug Report

1. Minimal reproduce step (Required)

  1. create table like this:
    CREATE TABLE tb (
    id int(11) NOT NULL,
    f1 int(11) DEFAULT NULL,
    f2 int(11) DEFAULT NULL,
    f3 int(11) DEFAULT NULL,
    PRIMARY KEY (id),
    KEY idx_f1 (f1) ,
    KEY idx_f2 (f2)
    ) partition by hash(id)
    (partition p0,
    partition p1,
    partition p2);
  2. insert data:
    step 1: insert one million records by the following script:
#!/bin/bash    
HOST="127.0.0.1"    
PORT="xxxx"    
USER="xxxx"    
DB="test"    
TABLE="tb"    
    
TOTAL_ROWS=1000000  
     
COUNTER=0    
  
BATCH_SIZE=10000    
  
generate_data() {    
    local sql="INSERT INTO $TABLE (id, f1, f2, f3) VALUES "    
    for i in $(seq 1 $BATCH_SIZE); do    
        local id=$((COUNTER + i))    
        if ((id > TOTAL_ROWS)); then   
            break  
        fi  
        local f1=$((RANDOM % 1000))    
        local f2=$((RANDOM % 1000))    
        local f3=$((RANDOM % 1000))    
      
        if [[ $i -ne 1 ]]; then    
            sql+=","    
        fi    
        sql+="($id, $f1, $f2, $f3)"    
    done
}    
    
while ((COUNTER < TOTAL_ROWS)); do    
    DATA=$(generate_data)    
    
    COUNTER=$((COUNTER + BATCH_SIZE))    
 
    sleep 1
done 

step 2: Insert the following data to construct a data skew
insert into tb (id,f1,f2,f3) values(10000001,10000,102,10000),(20000001,20000,103,20000);
3. Execute analyze table
step 1: analyze table tb;
step 2: analyze table tb update histogram on id,f1,f2;
4. Execute Query
image
The estRows above is incorrect.

However, what's puzzling is that if we insert 1000 records[Still using the previous script, modifyvariable TOTAL_ROWS to 1000], you will get the correct estRows:
image

2. What did you expect to see?

image

3. What did you see instead

image

4. What is your TiDB version?

pd instance:v8.0.0
tikv instance:v8.0.0
tidb instance:v8.0.0
tiflash instance:v8.0.0

Doesn't affect the query result, degrade this from Major to Moderate.