Uptime Kuma is very slow when deployed on NFS

Question

Uptime Kuma is very slow when deployed on NFS

sudoexec opened this issue 6 months ago · comments

browse commented 6 months ago

📑 I have found these related issues/pull requests

No related issues.

🛡️ Security Policy

I agree to have read this project Security Policy

Description

Deployed in k8s.
I have 54 monitors and make them all in groups.
The homepage is always return blank pages without any monitors(or waiting for a long time there will show monitors), and the settings page is very slowly.
In the devtools,websocket sometimes is pending, sometimes return very slowly(getTags request).

👟 Reproduction steps

At first, there're a few monitors, it works fine.
But now I have 54, it doesn't work well.
In my situation, when you add more and more monitors, this will reproduce.

👀 Expected behavior

When opening the homepage, show all monitors immediately.

😓 Actual Behavior

The websocket communication is very slowly and sometimes failed.

🐻 Uptime-Kuma Version

1.23.11

💻 Operating System and Arch

k8s

🌐 Browser

Chromium 123.0.6312.105/FireFox 124.0.2

🖥️ Deployment Environment

Runtime: k8s
Database: sqlite
Filesystem used to store the database on: nfs
number of monitors: 54

📝 Relevant log output

Monitor #23 'Group': Failing: Child inaccessible | Interval: 60 seconds | Type: group | Down Count: 0 | Resend Interval: 0



Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async Monitor.calcUptime (/app/server/model/monitor.js:1255:22)
    at async Monitor.sendUptime (/app/server/model/monitor.js:1321:24)
    at async Monitor.sendStats (/app/server/model/monitor.js:1189:13) {
  sql: '\n' +
    '            SELECT\n' +
    '               -- SUM all duration, also trim off the beat out of time window\n' +
    '                SUM(\n' +
    '                    CASE\n' +
    '                        WHEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400 < duration\n' +
    '                        THEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400\n' +
    '                        ELSE duration\n' +
    '                    END\n' +
    '                ) AS total_duration,\n' +
    '\n' +
    '               -- SUM all uptime duration, also trim off the beat out of time window\n' +
    '                SUM(\n' +
    '                    CASE\n' +
    '                        WHEN (status = 1 OR status = 3)\n' +
    '                        THEN\n' +
    '                            CASE\n' +
    '                                WHEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400 < duration\n' +
    '                                    THEN (JULIANDAY(`time`) - JULIANDAY(?)) * 86400\n' +
    '                                ELSE duration\n' +
    '                            END\n' +
    '                        END\n' +
    '                ) AS uptime_duration\n' +
    '            FROM heartbeat\n' +
    '            WHERE time > ?\n' +
    '            AND monitor_id = ?\n' +
    '        ',
  bindings: [
    '2024-03-08 09:49:42',
    '2024-03-08 09:49:42',
    '2024-03-08 09:49:42',
    '2024-03-08 09:49:42',
    '2024-03-08 09:49:42',
    48
  ]
}

Frank Elsinga · Answer 1 · Sun Apr 07 2024 23:00:32 GMT+0800 (China Standard Time)

Filesystem used to store the database on: nfs

Please refer to the wiki why using NFS is not a good idea for an database both to prevent db-corruption and performance.

This issue might also be reated to the performance problems of V1 (having to read the entire table) resolved in the upcoming V2.0 release. Please see #4500
=> Have you checked your retention? What is the size of the database?

browse · Answer 2 · Mon Apr 08 2024 01:13:07 GMT+0800 (China Standard Time)

The database size is 228M and there're 1531440 records in heartbeat table.

Frank Elsinga · Answer 3 · Mon Apr 08 2024 02:51:11 GMT+0800 (China Standard Time)

Okay, so just NFS being unsuitable for running a database.
Please migrate to local storage instead as suggested in the installation guide.

browse · Answer 4 · Mon Apr 08 2024 09:34:00 GMT+0800 (China Standard Time)

Thanks for your help. I migrate the storage to hostpath, and it worked.