Port not closed, process doesn't exit on OpenSuse Tumbleweed with Node v20
toptensoftware opened this issue · comments
SerialPort Version
v9 and v11 (at least)
Node Version
v20.3.1
Electron Version
n/a
Platform
OpenSUSE Tumbleweed: Linux localhost.localdomain 6.4.3-1-default #1 SMP PREEMPT_DYNAMIC Tue Jul 11 06:23:11 UTC 2023 (5ab030f) x86_64 x86_64 x86_64 GNU/Linux
Architecture
x64
Hardware or chipset of serialport
FT232RL and cp210x
What steps will reproduce the bug?
See sample program below using Serial Port v11.
First problem:
- Run program as shown on OpenSusu Tumbleweed with node v20 and the process never exits
- Removing either the data handler section or the drain section allows the process to exit
- Problem doesn't happen on Ubuntu with Node v20 or OpenSusu Tumbleweed with Node v18
- Problem happens with Serial Port v9.0.4 and v11.0.0
Second problem: (probably related), attempting to reopen the serial port after closing it fails with "[Error: Error Resource temporarily unavailable Cannot lock port]".
let { SerialPort } = require('serialport');
function open(device, baud)
{
let serialPortOptions = {
dataBits: 8,
stopBits: 1,
parity: 'none',
baudRate: baud,
path: device,
};
return new Promise((resolve, reject) => {
let port = new SerialPort(serialPortOptions, function(err) {
if (err)
reject(err);
else
resolve(port);
});
});
}
function drain(port)
{
// Drain port
return new Promise((resolve, reject) => {
port.drain((function(err) {
if (err)
reject(err);
else
resolve();
}));
});
}
function close(port)
{
return new Promise((resolve, reject) => {
port.close(function(err) {
if (err)
reject(err);
else
resolve();
});
})
}
function onDataHandler(data)
{
console.log("On Data");
}
(async function() {
// Open port
console.log("Opening...");
let port = await open("/dev/ttyUSB0", 115200);
console.log("OK");
// Data handler
port.on('data', onDataHandler);
port.off('data', onDataHandler);
// Drain
console.log("Draining");
await drain(port);
console.log("OK");
// Close
console.log("Closing");
await close(port);
console.log("OK");
// PROBLEM 1: Process never exits.
// Remove "data handler" or "drain" calls above, problem doesn't happen
// PROBLEM 2: (possibly same issue) attempting to re-open the port fails with
// "[Error: Error Resource temporarily unavailable Cannot lock port]"
// Open again...
/*
console.log("Opening (again)...");
port = await open("/dev/ttyUSB0", 115200);
console.log("OK");
*/
})();
What happens?
- Process didn't exit even though all event handlers removed, all completion callbacks invoked and all promises resolved.
- Port seems to be left open and can't be re-opened.
What should have happened?
- Process should exit
- Port should be able to be re-opened.
Additional information
As noted, doesn't happen on Ubuntu and doesn't happen pre-node v20.
When you mention that this issue doesn't appear on Ubuntu, would I be correct to infer that you are using the LTS release (22.04)?
If so, I wonder if this is down to some recently introduced incompatibility between one of the distro packages, and one of the node js changes or dependencies?
Would it be possible to run some extra tests to narrow down the specific change which causes the incompatibility? for example to see if the issue also shows up on the bleeding edge versions of ubuntu?
Looking at the recent changes on both the opensuse tumbleweed and node v20, I wonder if it could be due to changes to systemd (249 vs 252) and/or libuv (1.44 vs 1.46) - however this is pure speculation at this point, and really needs investigation.
Good point re: Ubuntu version. I think I tested on Ubuntu 20.04, so not sure about 22.04. I'll setup a VM and let you know.
Same issue happens on Ubuntu 22.04, node v20
Doesn't happen on Ubuntu 22.04, node v16 (didn't test v18, I presume it's fine)
System info below for reference.
I hope this helps.
brad@lin:~$ uname -a
Linux lin 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
brad@lin:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
I think I can replicate this issue with node js 20.3.0; but not in 20.2.0
I fear this going to be due to node JS updating to libuv 1.45, and in particular the changes to introduce io_uring support
I'll try to investigate further as time permits
I can confirm that for me the issue only appears when the libuv dependency within Nodejs is upgraded from 1.44 to 1.45. This is based on using serialport 11 and ubuntu 22.04.2 LTS ( 5.19.0-46-generic ). Using the same versions of node and serialport don't show any regressions on windows, so I'm mainly focusing on the unix specific changes as the likely root cause.
I'll continue to look into this when I can, but hopefully this info will also help anyone else who is able to dig into the libuv changes and identify a fix.
I'm afraid the best workaround I can suggest at this time is to revert to node 20.2.0 or the LTS release (v18) until this issue is resolved
Thanks so much for looking into this. Hopefully this can get resolved quickly.
I'm investigating this. Can you post the output using DEBUG=*
when using 20.2.0? Thanks!
See below for logs from 20.2 (works) and 20.4 (doesn't work). Use nvm to switch editions, running on Ubuntu 22.04.1 LTS
Node v20.2:
brad@lin:~/flashy/flashy$ DEBUG=* node flashy /dev/ttyUSB0 --status
serialport/bindings loading LinuxBinding +0ms
Opening /dev/ttyUSB0 at 115,200 baud... serialport/stream opening path: /dev/ttyUSB0 +0ms
serialport/binding-abstract open +0ms
ok
Waiting for device... serialport/stream _read queueing _read for after open +1ms
serialport/bindings/poller Creating poller +0ms
serialport/stream opened path: /dev/ttyUSB0 +1ms
serialport/stream _write 12 bytes of data +0ms
serialport/binding-abstract write 12 bytes +3ms
serialport/stream _read reading { start: 0, toRead: 65536 } +1ms
serialport/binding-abstract read +1ms
serialport/bindings/unixWrite Starting write 12 bytes offset 0 bytesToWrite 12 +0ms
serialport/bindings/unixRead Starting read +0ms
serialport/bindings/unixWrite write returned: wrote 12 bytes +0ms
serialport/bindings/unixWrite Finished writing 12 bytes +0ms
serialport/stream binding.write write finished +2ms
serialport/stream drain +0ms
serialport/binding-abstract drain +1ms
serialport/bindings/unixRead read error [Error: EAGAIN: resource temporarily unavailable, read] {
errno: -11,
code: 'EAGAIN',
syscall: 'read'
} +1ms
serialport/bindings/unixRead waiting for readable because of code: EAGAIN +0ms
serialport/bindings/poller Polling for "readable" +3ms
serialport/stream binding.drain finished +8ms
serialport/bindings/poller received "readable" +11ms
serialport/bindings/unixRead Starting read +11ms
serialport/bindings/unixRead Finished read 52 bytes +0ms
serialport/stream binding.read finished { bytesRead: 52 } +3ms
ok
Found device:
- Raspberry Pi 3 Model B+
- Serial: 00000000-bc1b1209
- CPU Clock: 600MHz (range: 600-1400MHz)
- Bootloader: rpi2-aarch32 v2.0.15, max packet size: 4096
Closing serial port... serialport/stream drain +1ms
serialport/binding-abstract drain +12ms
serialport/stream _read reading { start: 52, toRead: 65484 } +1ms
serialport/binding-abstract read +1ms
serialport/bindings/unixRead Starting read +2ms
serialport/stream binding.drain finished +0ms
serialport/stream #close +0ms
serialport/binding-abstract close +0ms
serialport/bindings/poller Stopping poller +2ms
serialport/bindings/poller Destroying poller +0ms
serialport/bindings/unixRead read error [Error: EAGAIN: resource temporarily unavailable, read] {
errno: -11,
code: 'EAGAIN',
syscall: 'read'
} +0ms
serialport/stream binding.read error Error: Port is not open
at unixRead (/home/brad/flashy/flashy/node_modules/@serialport/bindings/lib/unix-read.js:33:21) {
canceled: true
} +0ms
serialport/stream _read queueing _read for after open +0ms
serialport/stream binding.close finished +2ms
ok
brad@lin:~/flashy/flashy$
Node v20.4:
brad@lin:~/flashy/flashy$ DEBUG=* node flashy /dev/ttyUSB0 --status
serialport/bindings loading LinuxBinding +0ms
Opening /dev/ttyUSB0 at 115,200 baud... serialport/stream opening path: /dev/ttyUSB0 +0ms
serialport/binding-abstract open +0ms
ok
Waiting for device... serialport/stream _read queueing _read for after open +2ms
serialport/bindings/poller Creating poller +0ms
serialport/stream opened path: /dev/ttyUSB0 +1ms
serialport/stream _write 12 bytes of data +0ms
serialport/binding-abstract write 12 bytes +3ms
serialport/stream _read reading { start: 0, toRead: 65536 } +0ms
serialport/binding-abstract read +1ms
serialport/bindings/unixWrite Starting write 12 bytes offset 0 bytesToWrite 12 +0ms
serialport/bindings/unixRead Starting read +0ms
serialport/bindings/unixWrite write returned: wrote 12 bytes +1ms
serialport/bindings/unixWrite Finished writing 12 bytes +0ms
serialport/stream binding.write write finished +2ms
serialport/stream drain +0ms
serialport/binding-abstract drain +1ms
serialport/stream binding.drain finished +5ms
serialport/bindings/unixRead Finished read 52 bytes +8ms
serialport/stream binding.read finished { bytesRead: 52 } +3ms
ok
Found device:
- Raspberry Pi 3 Model B+
- Serial: 00000000-bc1b1209
- CPU Clock: 600MHz (range: 600-1400MHz)
- Bootloader: rpi2-aarch32 v2.0.15, max packet size: 4096
Closing serial port... serialport/stream drain +1ms
serialport/binding-abstract drain +9ms
serialport/stream _read reading { start: 52, toRead: 65484 } +0ms
serialport/binding-abstract read +0ms
serialport/bindings/unixRead Starting read +2ms
serialport/stream binding.drain finished +1ms
serialport/stream #close +0ms
serialport/binding-abstract close +1ms
serialport/bindings/poller Stopping poller +12ms
serialport/bindings/poller Destroying poller +0ms
serialport/stream binding.close finished +0ms
ok
(Note: didn't return to command prompt with 20.4)
Thanks for the logs.
I've been looking into this and I'm a bit confused. By running the example in 20.2
with strace
I can see that the library opens the device in non-blocking mode
openat(AT_FDCWD, "/dev/pts/7", O_RDWR|O_NOCTTY|O_NONBLOCK|O_SYNC|O_CLOEXEC) = 20
By looking at the code, it seems uv_poll_t
is used to poll for events in that specific fd.
But there's an initial read()
on the device, which I guess it's triggered by the data
listener, without having received the EPOLLIN
event which of course returns EAGAIN
as it's non-blocking:
read(20, 0x58f20c0, 65536) = -1 EAGAIN (Resource temporarily unavailable)
Why is this read()
called directly instead of waiting for the POLLIN
event from uv_poll_t
handle? I'm saying this because that, apparently, unnecessary extra-read is causing the issue: the uv_fs_t
read request which wraps the read()
syscall in v20.2.0
completes with EAGAIN
whereas in v20.4.0
doesn't because the underlying io_uring implementation is waiting for some data to be available before returning.
Why is this read() called directly instead of waiting for the POLLIN event from uv_poll_t handle?
I guess this is a question for the serialport devs? @GazHank is this something you can answer/look into?
Looping in @reconbot I think your knowledge of the poller and the @serialport/stream might needed :-)
The initial read is getting queued due to @serialport/stream in system independent logic... but I'm on Windows at the moment and this example doesn't result in the read getting queued up...
node-serialport/packages/stream/lib/index.ts
Lines 278 to 285 in b2405f7
Once I can get back to a Linux box I think I'll try to run some check on older / LTS kernel versions to see if we get the same behaviour
Hi.
I'm facing the same issue here when running the following sample script using nodejs v20.4.0 on Ubuntu 23.10:
const {SerialPort} = require('serialport');
const f = async () => {
async function connect(sp) {
await new Promise((resolve, reject) => {
sp.open((error) => {
if (error) {
const err = new Error(`Error opening port: (${error.message})`);
return reject(err);
}
return resolve();
});
});
}
let sp;
sp = new SerialPort({ path:'/dev/ttyUSB1', baudRate: 115200, autoOpen: false});
sp.on('data', ()=>{}); // 1st data receiver
await connect(sp);
sp.close();
sp.removeAllListeners();
sp = new SerialPort({ path:'/dev/ttyUSB1', baudRate: 115200, autoOpen: false});
sp.on('data',()=>{}); // 2nd data receiver
await connect(sp);
sp.close();
sp.removeAllListeners();
}
f();
Note
When removing both linessp.on(...)
for 1st and 2nd data receiver, there is no issue anymore.
When removing only the 1st data receiver, execution either succeeds or hangs forever.
When keeping both lines like in the above example, or when removing only the 2nd data receiver, execution always fails withResource temporarily unavailable Cannot lock port
The output when executing the above script with nodejs v20.4.0:
$ node test.js
test.js:9
const err = new Error(`Error opening port: (${error.message})`);
^
Error: Error opening port: (Error Resource temporarily unavailable Cannot lock port)
at SerialPort.<anonymous> (test.js:9:33)
at SerialPort._error (node_modules/@serialport/stream/dist/index.js:82:22)
at node_modules/@serialport/stream/dist/index.js:118:18
Node.js v20.4.0
Running the very same script with nodejs v20.2.0 does not lead to any error.
It looks like this problem is related to libuv, and maybe specifically to when libuv started to use the Linux kernel's facility io_uring
instead of epoll
for async operations. This happened since this commit.
ℹ️ Indeed, one workaround is to switch back to the old behaviour in libuv using UV_USE_IO_URING. The following now works consistently for me in all cases (no hang, no error):
$ UV_USE_IO_URING=false node test.js
In the nodejs v20.2.0 I'm using, the embedded libuv is:
$ npm version | grep 'uv:'
uv: '1.44.2',
And in nodejs v20.4.0, it is:
$ npm version | grep 'uv:'
uv: '1.46.0',
I encounter the same issue on NixOS but earlier than version 20 :
v18.17.1
works as expectedv18.19.1
has the same issue as described by @lionelains (same code)
The libuv
is indeed different,
On v18.17.1
:
$ npm version | grep 'uv:'
uv: '1.44.2',
On v18.19.1
:
$ npm version | grep 'uv:'
uv: '1.48.0',
The workaround of setting UV_USE_IO_URING
to false does solves the issue on v18.19.1
.
@toptensoftware , @GazHank could you confirm that this workaround does solves your issue too ?
Thanks