sni / Thruk

Thruk is a multibackend monitoring webinterface for Naemon, Nagios, Icinga and Shinken using the Livestatus API.

Home Page:http://www.thruk.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple concurrent CGI calls locks server up

MattCrum1 opened this issue · comments

Describe the bug
Hi Sven / all - we've run into an interesting issue with a central Thruk instance with multiple remote Thruk HTTP backends.

We are using the central instance to pull PNP4Nagios graphs from each of the HTTP backends using proxy.cgi.

Works fine in normal UI operation, but when I try to grab multiple graphs (i.e. graphs for all hosts from a specific backend) to assemble a report, the server completely locks up - all TCP connections are blocked in and out.

I'm using this URL format:

https://monitoring.test/thruk/cgi-bin/proxy.cgi/12345/pnp4nagios/image?host=test&srv=test&view=3&graph_width=700&graph_height=140&start=12345&end=23456&source=graphsource

Hundreds of CGI processes are spawned and end up blocking server I/O - it takes a long time to recover. I am rate limiting the calls to the proxy URL to 2.5 requests per second, but it still locks up.

Thruk is running in Docker and the server is running on VMWare - only has 2 VCPUs and 8GB RAM. We can increase both, but not sure it will help.

Wondering if you have any suggestions for improving performance or preventing the server from locking? Enabling multi-threading for the CGI or enabling the backend HTTP_Proxy perhaps?

Thruk Version
Thruk 3.08.3

To Reproduce
Query the proxy URL ~50 times in quick succession to retrieve graphs from a remote pnp4nagios instance.

Expected behavior
Graphs to be returned without locking up the central monitoring server.