Sqlite errors

Question

Sqlite errors

fabry006 opened this issue 5 months ago · comments

fabry006 commented 5 months ago

⚠️ Please verify that this question has NOT been raised before.

I checked and didn't find similar issue

🛡️ Security Policy

I agree to have read this project Security Policy

📝 Describe your problem

I am having troubles with sqlite, sometimes I see timeouts errors like this one:

Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `ping`, `status`, `time`) values (0, 147870, true, 23, '200 - OK', 31, 1, '2024-04-25 13:35:05.866') - SQLITE_IOERR: disk I/O error] {
  errno: 10,
  code: 'SQLITE_IOERR'
}
    at consoleCall (<anonymous>)
    at Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:1028:25)
2024-04-25T15:35:05+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues
2024-04-25T15:35:05+02:00 [MONITOR] INFO: Try to restart the monitor
2024-04-25T15:35:05+02:00 [] INFO: Cannot write to error.log

I've deployed the software as a docker container with podman in a Virtual machine based on RHEL8 and mounted a volume to a local directory
The only way to resolve those errors is to restart the container.
I have ~30/40 monitors and the max nested depth (they are grouped with Groups) is 4

Can you please help me stabilize the environemnt?

📝 Error Message(s) or Log

Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `ping`, `status`, `time`) values (0, 147870, true, 23, '200 - OK', 31, 1, '2024-04-25 13:35:05.866') - SQLITE_IOERR: disk I/O error] {
  errno: 10,
  code: 'SQLITE_IOERR'
}
    at consoleCall (<anonymous>)
    at Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:1028:25)
2024-04-25T15:35:05+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues
2024-04-25T15:35:05+02:00 [MONITOR] INFO: Try to restart the monitor
2024-04-25T15:35:05+02:00 [] INFO: Cannot write to error.log

🐻 Uptime-Kuma Version

1.23.11-alpine

💻 Operating System and Arch

RHEL8 with podman

🌐 Browser

Edge

🖥️ Deployment Environment

Runtime: podman version 4.6.1
Database:
Filesystem used to store the database on: /opt type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
number of monitors: 40

Frank Elsinga · Answer 1 · Thu Apr 25 2024 22:39:52 GMT+0800 (China Standard Time)

As an explainaition what this error means from the docs:

The SQLITE_IOERR result code says that the operation could not finish because the operating system reported an I/O error.`
A full disk drive will normally give an SQLITE_FULL error rather than an SQLITE_IOERR error.
There are many different extended result codes for I/O errors that identify the specific I/O operation that failed.

A person on reddit reddit suggested the following. Could you try and report the results?

If it shows up immediately when connecting to the database, it might be an issue with [...] file permissions, the file path or path style (remember different slashes for windows/linux etc, and check that node isn't altering them), or some obscure virtual machine bug related to those.
Before you dig too deep into those, read up on your specific sqlite package on npm and make sure everything is set up correctly.
You may want to install/update sqlite3 or an equivalent through your package manager separately!

If it shows up after other successful queries, it could be an indication of data or media corruption [...]

Frank Elsinga · Answer 2 · Thu Apr 25 2024 22:44:47 GMT+0800 (China Standard Time)

Please also see #4110 if you have the same problem

fabry006 · Answer 3 · Fri Apr 26 2024 14:26:12 GMT+0800 (China Standard Time)

Thank you @CommanderStorm I read your links and not sure that they apply to my case.
Yesterday I restarted the container and, again, this morning is stuck thorough the same error. Just for testing I started the container as root in order to void any kind of permission but this didn't solve the problem
I tried to create a dir inside the same directory were the db is stored and there is no issue

This is the actual content of the folder

drwxrwxrwx. 2 root root         6 Apr 23 13:59 docker-tls
-rwxrwxrwx. 1 root root      7490 Apr 23 23:46 error.log
-rwxrwxrwx. 1 root root 444760064 Apr 26 08:15 kuma.db
-rwxrwxrwx. 1 root root     98304 Apr 26 08:15 kuma.db-shm
-rwxrwxrwx. 1 root root  33685152 Apr 26 08:19 kuma.db-wal
drwxrwxrwx. 2 root root         6 Apr 23 13:59 screenshots
drwxrwxrwx. 2 root root        28 Apr 23 13:59 truststore
drwxrwxrwx. 2 root root       108 Apr 23 13:59 upload

The VM is deployed in the corporate private cloud (based on VMware).
Before I tried to mount the volume in a mounted fs that is supposed to be used to be used for the app data but I was not sure that that is a local disk, so I changed the folder to /opt/uptime/volumes but I still see these errors.

Nelson Chan · Answer 4 · Fri Apr 26 2024 15:06:27 GMT+0800 (China Standard Time)

Can you try going into Settings -> Monitor History -> press "Shrink Database"?

fabry006 · Answer 5 · Fri Apr 26 2024 15:23:01 GMT+0800 (China Standard Time)

@chakflying So far the monitoring history is set to 0 (I need at least 1 year of data, hoping that in the future the Kuma UI will have a feature to select the time history for the whole year).
So what is the effect if I shrink the db? I think no data will be deleted. Am I wrong?

Nelson Chan · Answer 6 · Fri Apr 26 2024 15:57:08 GMT+0800 (China Standard Time)

Yes, no data would be deleted, it only runs a command to compact and organize the database file.

In general, most people who have reported database errors have had to reduce the retention time to eliminate the errors. I think we currently don't have the expertise to optimize our current usage of SQLite further, so there is no solution for now.

If running "Shrink Database" doesn't solve your issue, I think you can wait for external database support in 2.0, and in the meantime consider reducing the retention time after doing whatever backups necessary.

fabry006 · Answer 7 · Fri Apr 26 2024 16:01:36 GMT+0800 (China Standard Time)

@chakflying I will try and let you know

Frank Elsinga · Answer 8 · Fri Apr 26 2024 19:27:24 GMT+0800 (China Standard Time)

(we track the steps neasesary before a V2.0 release in #4500)

@chakflying I don't know if this is related as none of the other cases with this boiled down to this. At the current moment, I suspect a similar problem as in #4110 => a broken disk

github-actions · Answer 9 · Tue Jun 25 2024 20:00:18 GMT+0800 (China Standard Time)

We are clearing up our old help-issues and your issue has been open for 60 days with no activity.
If no comment is made and the stale label is not removed, this issue will be closed in 7 days.

fabry006 · Answer 10 · Tue Jun 25 2024 22:21:33 GMT+0800 (China Standard Time)

Reducing the database size worked. So I assume that it was just too big for sqlite