max_shards_per_node not behaving as documented
hlcianfagna opened this issue · comments
CrateDB version
5.6.3
CrateDB setup information
3 nodes cluster deployed with https://github.com/crate/crate-terraform
Problem description
According to https://cratedb.com/docs/crate/reference/en/5.6/config/cluster.html#shard-limits
Any operations that would result in the creation of additional shard copies that would exceed this limit are rejected.
However in a multi-node cluster (could not repro this on single-node environments) it seems under certain conditions the limit is not strictly enforced.
Steps to Reproduce
cr> select node['name'] ,closed,count(*) from sys.shards group by 1,2 limit 100;
+------------------+--------+----------+
| node['name'] | closed | count(*) |
+------------------+--------+----------+
| Hochschwung | FALSE | 26 |
| Schafberg | FALSE | 27 |
| Monte La Palazza | FALSE | 27 |
+------------------+--------+----------+
SELECT 3 rows in set (0.053 sec)
cr> set GLOBAL PERSISTENT "cluster.max_shards_per_node"=30;
SET OK, 1 row affected (0.056 sec)
cr> create table hernan.shardstest (a int) clustered into 8 shards;
CREATE OK, 1 row affected (0.149 sec)
cr> create table hernan.shardstest2 (a int) clustered into 5 shards;
SQLParseException[Validation Failed: 1: this action would add [5] total shards, but this cluster currently has [96]/[90] maximum shards open;]
cr> select node['name'] ,closed,count(*) from sys.shards group by 1,2 limit 100;
+------------------+--------+----------+
| node['name'] | closed | count(*) |
+------------------+--------+----------+
| Hochschwung | FALSE | 32 |
| Schafberg | FALSE | 32 |
| Monte La Palazza | FALSE | 32 |
+------------------+--------+----------+
Actual Result
The nodes end up with 32 shards
Expected Result
The limit of 30 shards is enforced, or the documentation is updated explaining how the limit works/when it may not be enforced.
I think it is caused by number_of_replicas
which is by default set to 0-1
. A workaround could be to not use a range value.
- Regarding original issue - I think it's more documentation issue.
See elastic/elasticsearch#51839
cluster.max_shards_per_node controls how many shards are allowed to exist in the cluster as a whole, and is checked at shard creation time, but does not pay attention to how many shards any individual node has
and 7.10 backport elastic/elasticsearch@e4054e4
NOTE: This setting does not limit shards for individual nodes.
I will port those docs + mention in docs that max_shards_per_node
doesn't take to account closed shards.
On auto-expanding replicas (which we have enabled by default, it's 0-1).
I found elastic/elasticsearch#2869 and elastic/elasticsearch@eb3d184.
Links above are talking about total_shards_per_node
- I will check whether it holds true for max_shards_per_node
and update docs if needed.
UPD: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules.html
Note that the auto-expanded number of replicas only takes allocation filtering rules into account, but ignores other allocation rules such as total shards per node,
Probably by design, but I will check why we cannot do it like @jeeminso proposed in #15805
- Regarding partitioned tables - it's a legitimate bug but visible only for
INSERT INTO ... VALUES
(many values).
insert-from subquery is not that badly exposed - see details in a fix.
Hi @hlcianfagna could you please post your initital create table statment(s) - how you got those 26/27 shards?
Also, couldn't exactly reproduce locally: do you get 32 shards per node after running
create table hernan.shardstest2 (a int) clustered into 5 shards;
What I'm saying is that slightly overshooting limit might be expected behavior but reporting error and still incrementing number of shards would be a bug.
I suspect that actually 32 shards were already there after
create table hernan.shardstest (a int) clustered into 8 shards;
how you got those 26/27 shards?
They were already there on a cluster that I do not have at hand anymore, however I just reproduced this again successfully using this on an empty cluster:
create table hernan.legacytables (a int) clustered into 40 shards;
I confirmed the 32 shards per node are there right after the command with clustered into 8 shards
I confirmed the 32 shards per node are there right after the command with clustered into 8 shards
ok, then I will follow my original plan - improve docs (basically expected behaviour for auto-expanding replicas).
As said, throwing an error this action would add ... total shards
and actually adding shards would be a bug -> but it's not the case.