taosdata / TDengine

TDengine is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, Industrial IoT and DevOps.

Home Page:https://tdengine.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

creating additional subtables causes dramatic performance deterioration

chessplay opened this issue · comments

使用taosbenchmark做性能测试,子表数量增加的写入性能急剧下降
我们的业务场景有1000万设备,因此模拟了1000万子表,使用tasobenchmark 用64个线程,每个子表插入1条数据,当子表数量达到百万级别后发现写入速度从最开始的几百条每秒降到了几十条每秒,请问是什么原因
image

image

  • OS: Centos7.9
  • Memory:64G, CPU:16core, current Disk Space :512G
  • TDengine Version 3.3.0
    taosbenchmark的配置文件:
    {
    "childtable_prefix": "d",
    "auto_create_table": "yes",
    "batch_create_tbl_num": 32,
    "data_source": "rand",
    "insert_mode": "taosc",
    "non_stop_mode": "no",
    "line_protocol": "line",
    "insert_rows": 1,
    "childtable_limit": 0,
    "childtable_offset": 0,
    "interlace_rows": 0,
    "insert_interval": 0,
    "partial_col_num": 0,
    "timestamp_step": 10,
    "start_timestamp": "2020-10-01 00:00:00.000",
    "sample_format": "csv",
    "sample_file": "./sample.csv",
    "use_sample_ts": "no",
    "tags_file": "",
    "columns": [
    {"type": "FLOAT", "name": "current", "count": 1, "max": 12, "min": 8 },
    { "type": "INT", "name": "voltage", "max": 225, "min": 215 },
    { "type": "FLOAT", "name": "phase", "max": 1, "min": 0 }
    ],
    "tags": [
    {"type": "TINYINT", "name": "groupid", "max": 10, "min": 1},
    {"type": "BINARY", "name": "location", "len": 16,
    "values": ["San Francisco", "Los Angles", "San Diego",
    "San Jose", "Palo Alto", "Campbell", "Mountain View",
    "Sunnyvale", "Santa Clara", "Cupertino"]
    }
    ]
    }
    ]
    }
    ]
    }
    Additional Context
    Add any other context about the problem here.

vgroup 太少了吧。是不是默认的 2 个? 1000万表可以做100个vgroup

另外,只有一条数据的场景比较极端,建议用真实场景。

真实场景每个设备一同一时间是只发一条数据过来,有1000万设备同时发,每隔5分钟上报一次,但是同一设备发来的数据是不能攒批写入的,发过来就要入库。这种情况下用taosbenchmark模拟参数要如何设置啊

您先吧vgroups 放大一下吧,模拟数据里有时间戳间隔的参数,你可以文档搜索一下

vgroup按照您说的我调到了100,确实有效果,单个线程写入速度达到了1000-2000条/s,16个线程。算下来总的每秒差不多写入2-3万条。但是这个感觉还是没有达到我们业务所需的qps,请问下还有哪些参数是可以调整优化的,我看服务器cpu和内存还有挺多都没吃满

表结构是什么样的

大致下面这样:CREATE STABLE IF NOT EXISTS demo.eg_application_flow (
gather_time TIMESTAMP,
app_group_name NCHAR(100),
app_name NCHAR(100),
up_rate BIGINT,
down_rate BIGINT,
up_down_rate BIGINT,
flow_up BIGINT,
flow_down BIGINT,
flow_up_down BIGINT,
drop_up BIGINT,
drop_down BIGINT,
active_sec INT
) TAGS (
utc_code INT,
building_id INT
);

nchar是保存中文的,会比较占内存,如果用不上的话,尽可能用varchar, 或者缩小最大长度

数据压缩在这个业务表里面可以用到吗,能不能通过压缩来提高写入性能?

本身数据就是会被压缩的