cminyard / ser2net

Serial to network interface, allows TCP/UDP to serial port connections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need a flag to force ser2net to retry to open serial devices

montjoie opened this issue · comments

Hello

I have lot of board with embedded serial controller which disappear when board is powered off or reseted.
On such events Ser2net close telnet connection. (even with LOCAL flag)

It should have a way to retry opening the device (like a ALWAYSRETRY flag).
Perhaps with a timeout.

So the code path could be like:
on device disapear:
if not flag:
close network connection
else:
try reopen until timeout

Regards

The /dev/ttyX node itself disappears when board power is cut, and ser2net exits.

I have several devices where this happens also (original BeagleBone "white", SiFive unleashed, TI AM335x ICEv2, and a handful of others I can't remember.) The root cause is that an onboard FTDI (or similar) is not powered over USB, but is powered by the main board power, so when the board is powered off, the FTDI is also, causing the /dev/ttyUSBx to disappear on the host.

I agree that in general, you might want to know this happens and propagate the information. But when it's expected to happen, it's not really news. We just need a way to say "we know that /dev/ttyX will disappear sometimes, but trust me, it will come back, please don't drop the network connection until $TIMEOUT"

Oh, and I can confirm that LOCAL does not solve the problem. I'm using ser2net v4.3.3 (debian bullseye)

Hello,
I am using RasPi with Buster and ser2net version 4.3.3. gensio.ac is 2.3.0

I just checked that if I pull a FTDI while connected, I have seen once that ser2net daemon disappear, on that first pull.
But subsequent do-overs, I see the service stay and I can then reconnect to the listen port after it is plugged in. I have not gotten it to disappear after restarting ser2net. So I am not sure why ser2net dropped that first time. All pulls of the FTDI since, just closes the TCP socket. Where I can reconnect after re-inserting the FTDI.

That first pull, was with my test Pi being up for days.

I'm going to take a different angle on this. I'm going to create a gensio that will do this with the gensio below it. If the gensio below it fails, it will close it and then attempt to re-open it periodically. This is simpler than doing it in ser2net and it's more useful.

This is going to force me to fix a general problem that I've been needing to fix for a while, but I've figured out how to do that in a clean way.

So it's not going to be as hard as I thought.

Does ser2net exit? Or does the network connection close?

The network connect closes, ser2net stay open and rports dev read error for device on port con01: Remote end closed connection when the /dev/ttyX node disappears.

Glad you hear you have an idea for a solution. I'm happy to beta test you if needed.

Ok, I have a test solution available. You will have to get the end of the gensio git tree on github and use that.

There is a new filter gensio, keepopen, which will do what you want. In you case, the connector will be something like:

keepopen,serialdev,/dev/ttyACM0,115200n81

It retries every 1 second. You can change that with the retry-time option, which is in millseconds, like:

keepopen(retry-time=2000),serialdev,/dev/ttyACM0,115200n81

Any data to the serialdev is flow-controlled if it is not there and will be sent to it later. If you would prefer to throw that data away, there is a discard-badwrites option to do that.

ok, I built gensio from source, and give it a spin.
First thing I noticed is that ser2net doesn't seem to like the new option:

$ cat test.yaml
%YAML 1.1
---
define: &confver 1.0
define: &banner KJH local testing

connection: &con01
    accepter: telnet(rfc2217,mode=server),12346
##    connector: serialdev(nouucplock=true),/dev/am335x-icev2,115200n81,local
    connector: keepopen,serialdev(nouucplock=true),/dev/am335x-icev2,115200n81
    options:
        max-connections: 10
        banner: *banner
$ ./ser2net -c test.yaml -d -l
ser2net[2952515]: test.yaml:12(column 0): device configuration keepopen,serialdev(nouucplock=true),/dev/am335x-icev2,115200n81 invalid: Invalid data to parameter
test.yaml:12(column 0): device configuration keepopen,serialdev(nouucplock=true),/dev/am335x-icev2,115200n81 invalid: Invalid data to parameter

and then if I telnet to that port, I get

$ telnet localhost 12346
Trying ::1...
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

But then even more fun, if I go back to my formerly working config, ser2net seg faults as soon as I type the first char in a telnet session

$ cat test.yaml
%YAML 1.1
---
define: &confver 1.0
define: &banner KJH local testing

connection: &con01
    accepter: telnet(rfc2217,mode=server),12346
    connector: serialdev(nouucplock=true),/dev/am335x-icev2,115200n81,local
##    connector: keepopen,serialdev(nouucplock=true),/dev/am335x-icev2,115200n81
    options:
        max-connections: 10
        banner: *banner
$ ./ser2net -c test.yaml -d -l
Segmentation fault

OK, it's working great now!

I had uninstalled my distro packge of libgensio-dev and then built & installed gensio & ser2net from source. However, I didn't notice that I still had a libgensio0 package installed. The ser2net configure step was picking up my locally built/installed stuff from /usr/local/lib, but at runtime, it was finding /usr/lib/x86_64-linux-gnu/libgensio.so.0 from the distro package.
Ensuring that I'm using everything from locally built versions, it's working! Will do some more testing & experiments tomorrow.

i tried to use this new feature, but I fail to compile gensio cminyard/gensio#40

The keepopen feature working fine if I connect from telnet, but for some reason connecting with microcom, ser2net segfaults upon first connection.

microcom allows you to connect over telnet with microcom -t <host>:<port>

Still failing as soon as I connect with microcom, but this time with a (hopefully useful) message:

$ ./ser2net -c ./test.yaml -d -l
ser2net: tpp.c:82: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)' failed.
Aborted
Reading symbols from ./ser2net...
(gdb) handle SIGUSR1 nostop
Signal        Stop      Print   Pass to program Description
SIGUSR1       No        Yes     Yes             User defined signal 1
(gdb) run -c ./test.yaml -d -l
Starting program: /home/khilman/work/kernel/ci/ser2net/ser2net -c ./test.yaml -d -l
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGUSR1, User defined signal 1.

Program received signal SIGUSR1, User defined signal 1.
ser2net: tpp.c:82: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)' failed.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7d74537 in __GI_abort () at abort.c:79
#2  0x00007ffff7d7440f in __assert_fail_base (fmt=0x7ffff7edd128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x7ffff7f2c110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)", file=0x7ffff7f2c104 "tpp.c", line=82, function=<optimized out>)
    at assert.c:92
#3  0x00007ffff7d83662 in __GI___assert_fail (assertion=assertion@entry=0x7ffff7f2c110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)", 
    file=file@entry=0x7ffff7f2c104 "tpp.c", line=line@entry=82, function=function@entry=0x7ffff7f2c1c0 <__PRETTY_FUNCTION__.0> "__pthread_tpp_change_priority") at assert.c:101
#4  0x00007ffff7f28e59 in __pthread_tpp_change_priority (previous_prio=previous_prio@entry=-1, new_prio=new_prio@entry=0) at tpp.c:82
#5  0x00007ffff7f1f290 in __pthread_mutex_lock_full (mutex=0x55555557b4f8) at ../nptl/pthread_mutex_lock.c:545
#6  0x000055555555ca43 in sergensio_val_set (sio=<optimized out>, err=<optimized out>, val=115200, cb_data=0x0) at dataxfer.c:739
#7  0x00007ffff7f6900c in serconf_process (sdata=0x55555558fa20) at sergensio_serialdev.c:253
#8  sterm_deferred_op (runner=<optimized out>, cbdata=0x55555558fa20) at sergensio_serialdev.c:221
#9  0x00007ffff7f7df23 in process_runners (sel=sel@entry=0x5555555792c0) at selector.c:1090
#10 0x00007ffff7f7fb7d in sel_select_intr_sigmask (sel=0x5555555792c0, send_sig=send_sig@entry=0x555555562510 <wake_thread_send_sig>, thread_id=thread_id@entry=140737488347800, 
    cb_data=cb_data@entry=0x0, timeout=timeout@entry=0x0, sigmask=sigmask@entry=0x0) at selector.c:1409
#11 0x00007ffff7f802fa in sel_select (sel=<optimized out>, send_sig=send_sig@entry=0x555555562510 <wake_thread_send_sig>, thread_id=thread_id@entry=140737488347800, 
    cb_data=cb_data@entry=0x0, timeout=timeout@entry=0x0) at selector.c:1445
#12 0x0000555555562567 in op_loop (dummy=dummy@entry=0x0) at ser2net.c:463
#13 0x000055555555a25f in main (argc=<optimized out>, argv=<optimized out>) at ser2net.c:1032

sorry, you saw my original backtrace before I deleted it and updated with a nostop

also, here's a version with both gensio and ser2net rebuilt without pthreads (e.g. passed --without-pthreads to configure):

Reading symbols from ./ser2net...
(gdb) handle SIGUSR1 nostop
Signal        Stop      Print   Pass to program Description
SIGUSR1       No        Yes     Yes             User defined signal 1
(gdb) run -c ./test.yaml -d -l
Starting program: /home/khilman/work/kernel/ci/ser2net/ser2net -c ./test.yaml -d -l

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7f7ede5 in i_sel_wake_first (sel=0x5555555782c0) at selector.c:319
#2  sel_run (runner=0x55555558dae0, func=0x7ffff7f7b100 <gensio_runner_handler>, cb_data=0x55555558d1a0) at selector.c:1068
#3  0x00007ffff7f4d770 in basen_sched_deferred_op (ndata=0x55555558d040) at gensio_base.c:1060
#4  basen_sched_deferred_op (ndata=0x55555558d040) at gensio_base.c:1055
#5  base_gensio_server_start (io=io@entry=0x55555558d1d0) at gensio_base.c:1931
#6  0x00007ffff7f5591d in netna_readhandler (iod=<optimized out>, cbdata=0x555555590260) at gensio_net.c:701
#7  0x00007ffff7f7de13 in handle_selector_call (sel=sel@entry=0x5555555782c0, fdc=0x55555558c850, handler=0x7ffff7f7b010 <iod_read_handler>, enabled=<optimized out>, fdset=0x0)
    at selector.c:1128
#8  0x00007ffff7f7fc52 in handle_selector_call (handler=<optimized out>, enabled=<optimized out>, fdset=0x0, fdc=<optimized out>, sel=0x5555555782c0) at selector.c:1114
#9  process_fds_epoll (isigmask=0x0, tstimeout=0x7fffffffdfc0, sel=0x5555555782c0) at selector.c:1281
#10 sel_select_intr_sigmask (sel=sel@entry=0x5555555782c0, send_sig=send_sig@entry=0x0, thread_id=thread_id@entry=0, cb_data=cb_data@entry=0x0, timeout=timeout@entry=0x0, 
    sigmask=sigmask@entry=0x0) at selector.c:1376
#11 0x00007ffff7f7ffba in sel_select (sel=sel@entry=0x5555555782c0, send_sig=send_sig@entry=0x0, thread_id=thread_id@entry=0, cb_data=cb_data@entry=0x0, timeout=timeout@entry=0x0)
    at selector.c:1445
#12 0x00007ffff7f8001c in sel_select_loop (sel=0x5555555782c0, send_sig=send_sig@entry=0x0, thread_id=thread_id@entry=0, cb_data=cb_data@entry=0x0) at selector.c:1466
#13 0x000055555555a0a0 in op_loop (dummy=0x0) at ser2net.c:582
#14 main (argc=<optimized out>, argv=<optimized out>) at ser2net.c:1032

Debian bullseye on an Intel NUC (2x Core i3 CPUs).
And FWIW, the thread library mentioned by gdb (/lib/x86_64-linux-gnu/libthread_db.so.1) is provided by debian libc6 package, version 2.31-13+deb11u3

Also, just to be sure your're testing the same config, here's the one that fails. Note that the commented line (without the keepopen works just fine) with microcom, only the config with keepopen triggers the SIGSEGV.

%YAML 1.1
---
define: &confver 1.0
define: &banner KJH local testing

connection: &con01
    accepter: telnet(rfc2217,mode=server),12346
##    connector: serialdev(nouucplock=true),/dev/am335x-icev2,115200n81,local
    connector: keepopen(retry-time=2000,discard-badwrites),serialdev(nouucplock=true),/dev/am335x-icev2,115200n81
    options:
        max-connections: 10
        banner: *banner

I updated to use your "without threads" fix, and here's a backtrace:

Reading symbols from ./ser2net...
(gdb) run -c ./test.yaml -d -l
Starting program: /home/khilman/work/kernel/ci/ser2net/ser2net -c ./test.yaml -d -l

Program received signal SIGSEGV, Segmentation fault.
sergensio_val_set (sio=<optimized out>, err=<optimized out>, val=115200, cb_data=0x0) at dataxfer.c:743
743             if (!netcon->net)
(gdb) bt
#0  sergensio_val_set (sio=<optimized out>, err=<optimized out>, val=115200, cb_data=0x0) at dataxfer.c:743
#1  0x00007ffff7f68f8c in serconf_process (sdata=0x55555558e8a0) at sergensio_serialdev.c:253
#2  sterm_deferred_op (runner=<optimized out>, cbdata=0x55555558e8a0) at sergensio_serialdev.c:221
#3  0x00007ffff7f7dbe3 in process_runners (sel=sel@entry=0x5555555782c0) at selector.c:1090
#4  0x00007ffff7f7f85d in sel_select_intr_sigmask (sel=sel@entry=0x5555555782c0, send_sig=send_sig@entry=0x0, thread_id=thread_id@entry=0, cb_data=cb_data@entry=0x0, 
    timeout=timeout@entry=0x0, sigmask=sigmask@entry=0x0) at selector.c:1409
#5  0x00007ffff7f7ffda in sel_select (sel=sel@entry=0x5555555782c0, send_sig=send_sig@entry=0x0, thread_id=thread_id@entry=0, cb_data=cb_data@entry=0x0, timeout=timeout@entry=0x0)
    at selector.c:1445
#6  0x00007ffff7f8003c in sel_select_loop (sel=0x5555555782c0, send_sig=send_sig@entry=0x0, thread_id=thread_id@entry=0, cb_data=cb_data@entry=0x0) at selector.c:1466
#7  0x000055555555a0a0 in op_loop (dummy=0x0) at ser2net.c:582
#8  main (argc=<optimized out>, argv=<optimized out>) at ser2net.c:1032
(gdb) 

It works great! Thank you for the quick fix.

It works for me also (Tested with an armada-388-clearfog-pro which disconnect its serial on reset).
Thanks