Segfault in ape_disconnect() roughtly once a day on production server
opened this issue · comments
Hi,
I have a chat-like app using APE. Like once a day, the APE server crash.
I use Debian 6 and APE 1.1.1 from .deb file.
Sometimes I can see a "glibc crash detected [...] double free or corruption" in the logs. Sometimes not.
Yesterday I tried that :
- "git checkout" the current head of master branch
- "make" it with -g (instead of -O2)
- clone the main config file and set "daemon = no"
- run it in a screen with "gdb ./aped" then :
set args --cfg /etc/ape/ape_debug.conf
handle SIGPIPE nostop
run
It start well, server works as in daemon mode.
After around 8 hours, Nagios have said that the port 6969 has cease to respond.
4 new lines in the screen :
Error: Cannot write to socket 230; Connection timed out
Program received signal SIGSEGV, Segmentation fault.
0x00000000004186ae in ape_disconnect (co=0x86bf40, g_ape=0x62aa90) at src/servers.c:59
59 if (co->fd == sub->client->fd) {
(gdb)
I am currently tring to run in debug the server with a slight change in the code : before the problematic line, I have added a "if ( sub->client != NULL)".
I have added also a else with a ape_log() and a printf() call...
I will post some news in few days.
Regards,
Ludovic
OK. Two more crashes this night. It seems occurs when a client loosing his internet access while the server is repling to it.
I take 2 core dumps with GDB and I have print out some values. Console dump of the first crash this night :
[JS] Loading script /var/ape/framework/mootools.js...
[JS] Loading script /var/ape/framework/Http.js...
[JS] Loading script /var/ape/framework/userslist.js...
[JS] Loading script /var/ape/utils/utils.js...
[JS] Loading script /var/ape/commands/proxy.js...
[JS] Loading script /var/ape/examples/move.js...
[JS] Loading script /var/ape/utils/checkTool.js...
[JS] Loading script /var/ape/module_mtarget/chat_module.js...
Program received signal SIGPIPE, Broken pipe.
Error: Cannot write to socket 68; Broken pipe
Program received signal SIGSEGV, Segmentation fault.
0x00000000004186bf in ape_disconnect (co=0x11ab850, g_ape=0x62aa90) at src/servers.c:63
63 if (co->fd == sub->client->fd) {
Here gdb prints :
(gdb) print sub
$1 = (subuser *) 0x11fd130
(gdb) print sub->client
$2 = (ape_socket *) 0x35003800330033
(gdb) print sub->client->fd
Cannot access memory at address 0x350038003300e3
(gdb) bt full
#0 0x00000000004186bf in ape_disconnect (co=0x11ab850, g_ape=0x62aa90) at src/servers.c:63
sub = 0x11fd130
#1 0x00000000004086ae in sockroutine (g_ape=0x62aa90) at src/sock.c:445
readb = 0
bitev = 3
active_fd = 247
timeout_to_hang = 49
sl = {co = 0x7fffffffe110, tfd = 0x7fffffffdcc8}
new_fd = -1
nfds = 21
sin_size = 16
i = 10
tfd = 336
t_start = {tv_sec = 1369170540, tv_usec = 929675}
t_end = {tv_sec = 1369170540, tv_usec = 929675}
ticks = 0
uticks = 3964
lticks = 659
their_addr = {sin_family = 2, sin_port = 49090, sin_addr = {s_addr = 3776933214}, sin_zero = "\000\000\000\000\000\000\000"}
#2 0x0000000000406f53 in main (argc=3, argv=0x7fffffffe118) at src/entry.c:306
srv = 0x62a990
random = 6
im_r00t = 1
pidfd = 0
serverfd = 7
getrandom = 2470454
pidfile = 0x0
confs_path = 0x62a010 "/etc/ape/"
fdev = {basemem = 0x62abe4, add = 0x417ec4 <event_epoll_add>, remove = 0x417f50 <event_epoll_remove>, poll = 0x417f62 <event_epoll_poll>, get_current_fd = 0x417fb1 <event_epoll_get_fd>,
growup = 0x417fe0 <event_epoll_growup>, revent = 0x418028 <event_epoll_revent>, reload = 0x41809a <event_epoll_reload>, events = 0x14df1f0, epoll_fd = 6, handler = EVENT_EPOLL}
cfgfile = "/etc/ape/ape_debug.conf", '\000' <repeats 489 times>
g_ape = 0x62aa90
i = 0
The second crash show the same backtrace but a 0xfffffff adresse for sub->client (uninitialized memory ?). There is no SIGPIPE this time.
[JS] Loading script /var/ape/module_mtarget/chat_module.js...
Program received signal SIGSEGV, Segmentation fault.
0x00000000004186bf in ape_disconnect (co=0x70a6b0, g_ape=0x62aa90) at src/servers.c:63
63 if (co->fd == sub->client->fd) {
(gdb) generate-core-file
Saved corefile core.17516
(gdb) print sub
$3 = (subuser *) 0xa47050
(gdb) print sub->client
$4 = (ape_socket *) 0xffffffff
(gdb) backtrace
#0 0x00000000004186bf in ape_disconnect (co=0x70a6b0, g_ape=0x62aa90) at src/servers.c:63
#1 0x00000000004086ae in sockroutine (g_ape=0x62aa90) at src/sock.c:445
#2 0x0000000000406f53 in main (argc=3, argv=0x7fffffffe118) at src/entry.c:306
Could it be your chat module? I once had a module that was causing a segmentation fault but i can't exactly remember the statement that was causing it.
Issue over a years old. Closing. Reopen a new one if issue is still there.
Ok. I don't use APE anymore, so I can't say if issue is still there. Thanks all folks. Ludovic