/_cluster/state scales poorly
robottaway opened this issue · comments
Hi loving the plugin! We started using it about a month ago and it has allowed my team to get a real time view on what our nodes are doing. Hoping I can help to fix an issue we have with using it on more sizable clusters.
For any of our larger clusters we have many customers, and many indexes (~100). Calling "_cluster/state" is a guaranteed way to kill a browser. it is currently over 11 mega bytes of data. Chrome kills the page, other browsers even crash.
It maybe possible to use the filters on cluster state to reduce the size while still maintaining functionality?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-state.html
If I switch on no metadata filter the size drops down to 665Kb. It's much faster and possibly would scale to clusters much larger than ours. Not sure that's doable.
I don't want to preemptively fork and code. If you think this is something that warrants attention let me know. I will be extremely happy to add my coding efforts.
Hi, I did not test Bigdesk on sizeable cluster myself thus I am glad you report issues. Such feedback is very useful. 11MB per single _cluster/state response is really much for web browsers. If we are not using all the info then let's try to cut the size down by filtering. Agreed. Let me see what we can do about this.
It would also help me if you can share some of the following:
- version of Elasticsearch that you need to be supported
- number of nodes and indices per node (just my curiosity)
- typical Bigdesk refresh rate and history window size
Hi Lukas, I'm heading out to lunch with co-workers but when I get back I will gather this information for you... and Thank you for such fast response!
Versions range between 0.90.2 - 0.90.5
The largest cluster has ~120 indexes (one per client on our platform) with sizes ranging from < Mb to ~100Mb (only one near that size).
The refresh rate and history size are always left at the defaults, 2 seconds and 5 minutes I believe.
I should note we have a fair amount of custom settings and mappings per index. I think this might add a fair amount to the size of the cluster state resource.
The above is the resource usage of a node, we only have 1 bigdesk open, no other traffic. The instance is a EC2 m2 4xl instance type. Seems we're using up quite a bit of resources just trying to keep bigdesk going! I imagine it's the previously mentioned problem causing such CPU usage.
I spent last week upgrading our clusters. We are on 0.90.5 across the board now. I can help test and code.
It could be. I think we can really try to downsize the amount of the data Bigdesk pulls now.
Fixed by #41
Will be part of v2.2.2 release. I will release it in one day.
I have just released version 2.2.2
Can you test it please?
On it, thanks!
Do I need to have this version of bigdesk on every node? I only installed on a sneak node which is out of discovery on the edge of the cluster where only I am using it for bigdesk viewing. Maybe that is the trouble?
OK I think I accidentally installed the same 2.2.0 version. Let me try this again.
ad #40 (comment) - We are getting into an abstract art territory with clusters this big. Soon we can open a gallery!
ad #40 (comment) - you only need to have this plugin on a single node which you connect your browser to. Or you do not even need to install Bigdesk on any of the nodes at all. You can just download (or git clone) the Bigdesk on your FS and open the index.html and point it to the URL of one of the cluster node endpoints (you don't even need to download the Bigdesk - just run it from the web and select correct version of Bigdesk from http://bigdesk.org/v/).
For instance you can open the following URL in your browser:
http://bigdesk.org/v/master/?endpoint=http://localhost:9200&connect=true&refresh=5000#cluster
Assuming that
- you want to use Bigdesk
master
version. In your case you can replace it with .../v/2.2.2/
... - ES node is running on
http://localhost:9200
(change that to correct ES endpoint base) - you want to use
5
sec refresh interval instead of default 2 sec - you want the Bigdesk to auto connect (so you do not need to click the
connect
button explicitly) - switch directly to
cluster
tab
there are even more URL params that you can use.
We can try to find more opportunities for cutting the data. Also I am thinking if it would make sense to open a ticket in Elasticsearch and ask to add an option to compress the REST end point output. Still transferring 1.6MB over the HTTP from ES REST endpoint is a lot. Given this is just a simple text data the compression could help a lot IMO.
Those diagrams get pretty cool looking. We have 2 large clusters, one with 150+ indices and another with 200+. ES has been really stable for us.
Would it help if I could get a copy of that 1.6mb file? I can probably do that early tomorrow when I get to work.