apache / trafficserver

Apache Traffic Serverâ„¢ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.

Home Page:https://trafficserver.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault with `attach_server_session_to_client` and Puma backend

kenballus opened this issue · comments

🪲 Description

ATS, when configured with attach_server_session_to_client, segfaults when forwarding a request to a Puma backend.

Steps to reproduce

  1. Start a fresh Debian system.
docker --workdir /repro -it debian:bookworm
  1. Install ATS dependencies.
apt -y update && apt -y upgrade && apt -y install make autoconf automake libtool pkg-config gcc g++ zlib1g-dev libssl-dev libpcre3-dev libcap-dev libhwloc-dev libncurses5-dev libcurl4-openssl-dev flex libunwind-dev git
  1. Build and install ATS master. (The current commit at time of writing is e6182d9ac9c3f611cb33b3ef6dc98327df41c3d6)
git clone "https://github.com/apache/trafficserver" && cd trafficserver && autoreconf -if && ./configure --enable-debug && make -j$(nproc) && make install
  1. Install Puma dependencies.
apt -y install ruby-dev && gem install sinatra --version 3.0.6
  1. Install Puma.
gem install puma --version 6.3.0
  1. Copy the files in the "Files" section (below) into the filesystem.
  2. Start the Puma server.
ruby /repro/server.rb &
  1. Start ATS.
traffic_server &
  1. Install netcat.
apt -y install netcat-traditional
  1. Use netcat to send ATS an HTTP request with a short client-side timeout.
printf 'GET / HTTP/1.1\r\n\r\n' | nc -q 1 localhost 80
  1. Observe that ATS segfaults and crashes with the following output:
[Jul 31 16:31:00.644] traffic_crashlo NOTE: crashlog started, target=35696, debug=false syslog=true, uid=65534 euid=0
[Jul 31 16:31:00.645] traffic_crashlo NOTE: logging to 0x564faa1de830
[Jul 31 16:31:00.645] traffic_crashlo NOTE: readlink failed with Permission denied
[Jul 31 16:31:00.645] traffic_crashlo ERROR: wrote crash log to /usr/local/var/log/trafficserver/crash-2023-07-31-163100.log
traffic_server: received signal 11 (Segmentation fault)
traffic_server - STACK TRACE:
traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xc5)[0x564195bcfecb]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0)[0x7fda33fd6fd0]
traffic_server(_ZN18Http1ClientSession11do_io_closeEi+0xce)[0x564195c7a1a4]
traffic_server(_ZN18Http1ClientSession16state_keep_aliveEiPv+0x3b4)[0x564195c7b424]
traffic_server(_ZN12Continuation11handleEventEiPv+0xe7)[0x564195bd96f3]
traffic_server(+0x97da01)[0x564196051a01]
traffic_server(+0x97e631)[0x564196052631]
traffic_server(+0x97ed0c)[0x564196052d0c]
traffic_server(_ZN18UnixNetVConnection11net_read_ioEP10NetHandlerP7EThread+0x2b)[0x5641960556b9]
traffic_server(_ZN10NetHandler18process_ready_listEv+0x9b)[0x564196088793]
traffic_server(_ZN10NetHandler15waitForActivityEl+0x16d)[0x564196088b47]
traffic_server(_ZN7EThread15execute_regularEv+0x499)[0x5641960c8a95]
traffic_server(_ZN7EThread7executeEv+0x10b)[0x5641960c8c3f]
traffic_server(+0x9f341c)[0x5641960c741c]
/lib/x86_64-linux-gnu/libc.so.6(+0x89044)[0x7fda34024044]
/lib/x86_64-linux-gnu/libc.so.6(__clone+0x40)[0x7fda340a3860]
[2]+  Segmentation fault      (core dumped) traffic_server

Files

/usr/local/etc/trafficserver/records.yaml

ts:
  http:
    server_ports: 80 80:ipv6
    attach_server_session_to_client: 1

/usr/local/etc/trafficserver/remap.config

map / http://127.0.0.1:8000

/repro/server.rb

require 'sinatra/base'
require 'rack/handler/puma'

class App < Sinatra::Base
  get '*' do
    ""
  end
end

Rack::Handler::Puma.run(App.new, Port: 8000)

Crash log

Process:            [TS_MAIN] [35696]
Version:            Traffic Server 10.0.0
System Version:     Linux x86_64 #1 SMP PREEMPT_DYNAMIC Sat, 15 Jul 2023 19:25:49 +0000 6.4.3-arch1-2
Date:               Mon, 31 Jul 2023 16:31:00 +0000

No target signal information

No target CPU registers

Process Status:
Name: [TS_MAIN]
Umask:  0022
State:  S (sleeping)
Tgid: 35696
Ngid: 0
Pid:  35696
PPid: 1
TracerPid:  0
Uid:  65534 65534 65534 65534
Gid:  65534 65534 65534 65534
FDSize: 256
Groups: 65534
NStgid: 35696
NSpid:  35696
NSpgid: 35696
NSsid:  1
Kthread:  0
VmPeak:  3154692 kB
VmSize:  3092708 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    112092 kB
VmRSS:    112092 kB
RssAnon:     90844 kB
RssFile:     21248 kB
RssShmem:        0 kB
VmData:   150592 kB
VmStk:       132 kB
VmExe:      5540 kB
VmLib:     10508 kB
VmPTE:       560 kB
VmSwap:        0 kB
HugetlbPages:        0 kB
CoreDumping:  0
THP_enabled:  1
untag_mask: 0xffffffffffffffff
Threads:  48
SigQ: 0/579165
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001001
SigCgt: 0000000100004efe
CapInh: 0000000000000000
CapPrm: 000000000000040a
CapEff: 0000000000000400
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:  2
Seccomp_filters:  1
Speculation_Store_Bypass: thread vulnerable
SpeculationIndirectBranch:  conditional enabled
Cpus_allowed: ffffffff
Cpus_allowed_list:  0-31
Mems_allowed: 00000003
Mems_allowed_list:  0-1
voluntary_ctxt_switches:  41
nonvoluntary_ctxt_switches: 33

Process Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            unlimited            unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            1073741816           1073741816           files
Max locked memory         8388608              8388608              bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       579165               579165               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Versions

ATS: master branch, commit e6182d9ac9c3f611cb33b3ef6dc98327df41c3d6
OS: Debian Bookworm container running on top of Arch Linux with a 6.4.3 kernel
All other versions are either Debian defaults or specified in the reproduction steps.

EDIT: Add beetle emoji

I have attempted to reproduce this bug with a few other backend servers, but I have not been successful. I am very curious about what exactly Puma is doing with the connection to elicit this response from ATS.

Thank you for the very details reproduction steps. I was able to get this to crash with the steps you provided. I'll try to figure out what is going on and provide another update.

This crash is due to a use after free that occurs when a client transaction with an attached server session closes after the server session is released. The server session has a state of PoolableSession::KA_RESERVED, but the server session teardown code doesn't handle this so destroys itself even though the client still has a pointer to it. I will create a PR that fixes the crash, but I haven't yet figured out the appropriate way to get the server session returned to the global pool. I suspect this bug might be an old one created by #7849 but I don't know the history or code well enough to be certain. After I create the PR I'll ask for further help to fully resolve this.

Chris noted that this may be the same error I'm seeing #10396.

I was running with ASAN enabled, so the use-after-free triggered reliably although I wasn't seeing the crash (much I think).

I reran the reproduction steps after checking out pr #10399 and it no longer crashes.

@kenballus Now that this is merged, could you please verify this fix?

I can no longer reproduce the issue. Seems like it's fixed. Thanks everyone!