louislam / uptime-kuma

A fancy self-hosted monitoring tool

Home Page:https://uptime.kuma.pet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sqlite errors

fabry006 opened this issue Β· comments

⚠️ Please verify that this question has NOT been raised before.

  • I checked and didn't find similar issue

πŸ›‘οΈ Security Policy

πŸ“ Describe your problem

I am having troubles with sqlite, sometimes I see timeouts errors like this one:

Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `ping`, `status`, `time`) values (0, 147870, true, 23, '200 - OK', 31, 1, '2024-04-25 13:35:05.866') - SQLITE_IOERR: disk I/O error] {
  errno: 10,
  code: 'SQLITE_IOERR'
}
    at consoleCall (<anonymous>)
    at Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:1028:25)
2024-04-25T15:35:05+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues
2024-04-25T15:35:05+02:00 [MONITOR] INFO: Try to restart the monitor
2024-04-25T15:35:05+02:00 [] INFO: Cannot write to error.log

I've deployed the software as a docker container with podman in a Virtual machine based on RHEL8 and mounted a volume to a local directory
The only way to resolve those errors is to restart the container.
I have ~30/40 monitors and the max nested depth (they are grouped with Groups) is 4

Can you please help me stabilize the environemnt?

πŸ“ Error Message(s) or Log

Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `ping`, `status`, `time`) values (0, 147870, true, 23, '200 - OK', 31, 1, '2024-04-25 13:35:05.866') - SQLITE_IOERR: disk I/O error] {
  errno: 10,
  code: 'SQLITE_IOERR'
}
    at consoleCall (<anonymous>)
    at Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:1028:25)
2024-04-25T15:35:05+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues
2024-04-25T15:35:05+02:00 [MONITOR] INFO: Try to restart the monitor
2024-04-25T15:35:05+02:00 [] INFO: Cannot write to error.log

🐻 Uptime-Kuma Version

1.23.11-alpine

πŸ’» Operating System and Arch

RHEL8 with podman

🌐 Browser

Edge

πŸ–₯️ Deployment Environment

  • Runtime: podman version 4.6.1
  • Database:
  • Filesystem used to store the database on: /opt type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
  • number of monitors: 40

As an explainaition what this error means from the docs:

The SQLITE_IOERR result code says that the operation could not finish because the operating system reported an I/O error.`
A full disk drive will normally give an SQLITE_FULL error rather than an SQLITE_IOERR error.
There are many different extended result codes for I/O errors that identify the specific I/O operation that failed.

A person on reddit reddit suggested the following. Could you try and report the results?

  • If it shows up immediately when connecting to the database, it might be an issue with [...] file permissions, the file path or path style (remember different slashes for windows/linux etc, and check that node isn't altering them), or some obscure virtual machine bug related to those.
    Before you dig too deep into those, read up on your specific sqlite package on npm and make sure everything is set up correctly.
    You may want to install/update sqlite3 or an equivalent through your package manager separately!
  • If it shows up after other successful queries, it could be an indication of data or media corruption [...]

Please also see #4110 if you have the same problem

Thank you @CommanderStorm I read your links and not sure that they apply to my case.
Yesterday I restarted the container and, again, this morning is stuck thorough the same error. Just for testing I started the container as root in order to void any kind of permission but this didn't solve the problem
I tried to create a dir inside the same directory were the db is stored and there is no issue

This is the actual content of the folder

drwxrwxrwx. 2 root root         6 Apr 23 13:59 docker-tls
-rwxrwxrwx. 1 root root      7490 Apr 23 23:46 error.log
-rwxrwxrwx. 1 root root 444760064 Apr 26 08:15 kuma.db
-rwxrwxrwx. 1 root root     98304 Apr 26 08:15 kuma.db-shm
-rwxrwxrwx. 1 root root  33685152 Apr 26 08:19 kuma.db-wal
drwxrwxrwx. 2 root root         6 Apr 23 13:59 screenshots
drwxrwxrwx. 2 root root        28 Apr 23 13:59 truststore
drwxrwxrwx. 2 root root       108 Apr 23 13:59 upload

The VM is deployed in the corporate private cloud (based on VMware).
Before I tried to mount the volume in a mounted fs that is supposed to be used to be used for the app data but I was not sure that that is a local disk, so I changed the folder to /opt/uptime/volumes but I still see these errors.

Can you try going into Settings -> Monitor History -> press "Shrink Database"?

@chakflying So far the monitoring history is set to 0 (I need at least 1 year of data, hoping that in the future the Kuma UI will have a feature to select the time history for the whole year).
So what is the effect if I shrink the db? I think no data will be deleted. Am I wrong?

Yes, no data would be deleted, it only runs a command to compact and organize the database file.

In general, most people who have reported database errors have had to reduce the retention time to eliminate the errors. I think we currently don't have the expertise to optimize our current usage of SQLite further, so there is no solution for now.

If running "Shrink Database" doesn't solve your issue, I think you can wait for external database support in 2.0, and in the meantime consider reducing the retention time after doing whatever backups necessary.

@chakflying I will try and let you know

(we track the steps neasesary before a V2.0 release in #4500)

@chakflying I don't know if this is related as none of the other cases with this boiled down to this. At the current moment, I suspect a similar problem as in #4110 => a broken disk