ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.

Home Page:https://ossrs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash occurred after reloading the config.

xinqu opened this issue · comments

  1. Because some code has been modified, the line numbers in the core file no longer match the latest line numbers. The specific code is:
    srs_app_rtmp_conn.cpp:
    for (int i = 0; i < (int)on_connect->args.size(); i++) {
  2. Corresponding to the latest code, line 1315
    It's not just this one crash point, but they are all related to the args parameter in the authentication module's config. Every time there is a crash, the args vector does not exist anymore.
  3. Steps to reproduce:
    a. Authentication connection continuously times out (timeout)
    b. Attempt to open the video, fail (due to authentication)
    c. Disable authentication configuration (set http_hook to off)
    d. Program crashes
    It is inferred that after the configuration refresh, the config uses a new pointer, and the old pointer points to memory that has been released. At that time, the authentication module was using the pointer for a long time, eventually accessing non-existent data, causing a crash.

It is relatively easy to reproduce when the authentication connection definitely times out. However, it is unclear whether there is a possibility of a crash if the configuration is refreshed when authentication is successful and there is a large-scale connection access.

  1. Core dump:
(gdb)
#0  0x00002ae73a0cd91e in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#1  0x000000000047cd3f in SrsRtmpConn::http_hooks_on_connect (this=0x817edc0) at src/app/srs_app_rtmp_conn.cpp:1353
#2  0x000000000047d813 in SrsRtmpConn::check_vhost (this=0x817edc0) at src/app/srs_app_rtmp_conn.cpp:592
#3  0x00000000004816fb in SrsRtmpConn::do_cycle (this=0x817edc0) at src/app/srs_app_rtmp_conn.cpp:158
#4  0x000000000047b173 in SrsConnection::cycle (this=0x817ee20) at src/app/srs_app_conn.cpp:64
#5  0x00000000004b766b in SrsThread::thread_cycle (this=0x817eea0) at src/app/srs_app_thread.cpp:187
#6  0x00000000004b74c8 in SrsThread::thread_fun (arg=0x817eea0) at src/app/srs_app_thread.cpp:225
#7  0x00000000005493b7 in _st_thread_main () at sched.c:327
#8  0x0000000000549b08 in st_thread_create (start=0, arg=0x0, joinable=0, stk_size=988114480) at sched.c:591
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

TRANS_BY_GPT3

args is copied.
Let me see if I can reproduce what you said.

TRANS_BY_GPT3

I put the api-server to sleep for 60 seconds every time it receives a connection before responding:

class RESTClients(object):
    def POST(self):
        time.sleep(60)

Start the API:

python ~/srs/research/api-server/server.py 8085

Server configuration file:

vhost __defaultVhost__ {
    http_hooks {
        enabled         on;
        on_connect      http://127.0.0.1:8085/api/v1/clients;
    }

Play video: http://www.ossrs.net/players/srs_player.html?vhost=__defaultVhost__&app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true

Server log displays a 30-second timeout.

[2015-09-14 03:48:21.012][trace][24380][112] RTMP client ip=127.0.0.1
[2015-09-14 03:48:21.015][trace][24380][112] complex handshake success


[2015-09-14 03:48:51.017][error][24380][112][101] parse http post response failed. ret=1011(STREAM ioctl timeout)
[2015-09-14 03:48:51.017][error][24380][112][9] http post on_connect uri failed. client_id=112, url=http://127.0.0.1:8085/api/v1/clients, request={"action":"on_connect","client_id":112,"ip":"127.0.0.1","vhost":"__defaultVhost__","app":"live","tcUrl":"rtmp://127.0.0.1:1935/live","pageUrl":"http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935"}, response=, code=32745, ret=1011(Bad file descriptor)
[2015-09-14 03:48:51.017][error][24380][112][9] hook client on_connect failed. url=http://127.0.0.1:8085/api/v1/clients, ret=1011(Bad file descriptor)
[2015-09-14 03:48:51.017][error][24380][112][9] check vhost failed. ret=1011(Bad file descriptor)
[2015-09-14 03:48:51.017][warn][24380][112][9] client disconnect peer. ret=1004

Then disable http_hooks and reload:

vhost __defaultVhost__ {
    http_hooks {
        enabled         off;
        on_connect      http://127.0.0.1:8085/api/v1/clients;
    }
killall -1 srs

I found that the server did not encounter any problems.

[2015-09-14 03:51:01.858][trace][24380][100] reload config success.


[2015-09-14 03:51:14.870][trace][24380][114] RTMP client ip=127.0.0.1
[2015-09-14 03:51:14.872][trace][24380][114] complex handshake success
[2015-09-14 03:51:14.872][trace][24380][114] connect app, tcUrl=rtmp://127.0.0.1:1935/live, pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935, swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.26, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, args=null
[2015-09-14 03:51:14.872][trace][24380][114] out chunk size to 30000
[2015-09-14 03:51:14.883][trace][24380][114] ignored. set buffer length to 200
[2015-09-14 03:51:14.905][trace][24380][114] client identified, type=Play, stream_name=livestream, duration=-1.00
[2015-09-14 03:51:14.905][trace][24380][114] source url=/live/livestream, ip=127.0.0.1, cache=1, is_edge=0, source_id=-1[-1]
[2015-09-14 03:51:14.905][trace][24380][114] dispatch cached gop success. count=0, duration=-1
[2015-09-14 03:51:14.905][trace][24380][114] create consumer, queue_size=1.00, jitter=1
[2015-09-14 03:51:14.917][trace][24380][114] mw changed sleep 350=>350, max_msgs=128, esbuf=218750, sbuf 146988=>109375, realtime=0
[2015-09-14 03:51:14.917][trace][24380][114] start play smi=0.00, mw_sleep=350, mw_enabled=1, realtime=0, tcp_nodelay=0
[2015-09-14 03:51:14.917][trace][24380][115] ignored. set buffer length to 200
[2015-09-14 03:51:14.917][trace][24380][114] -> PLA time=0, msgs=0, okbps=34,0,0, ikbps=28,0,0, mw=350

TRANS_BY_GPT3

View the code, according to your inference, it is possible:



int SrsRtmpConn::http_hooks_on_connect()
{
    int ret = ERROR_SUCCESS;


#ifdef SRS_AUTO_HTTP_CALLBACK
    if (_srs_config->get_vhost_http_hooks_enabled(req->vhost)) {
        // HTTP: on_connect 
        SrsConfDirective* on_connect = _srs_config->get_vhost_on_connect(req->vhost);
        if (!on_connect) {
            srs_info("ignore the empty http callback: on_connect");
            return ret;
        }


        for (int i = 0; i < (int)on_connect->args.size(); i++) {
            std::string url = on_connect->args.at(i);

The condition is that if there are multiple hooks and after reloading, it executes the next on_connect args, there will be a problem.

Let me reproduce it again, configure multiple hooks, and the server only time.sleep(20) without timing out. Then reload within 20 seconds to release the config:

    http_hooks {
        enabled         on;
        on_connect      http://127.0.0.1:8085/api/v1/clients http://127.0.0.1:8085/api/v1/clients;
    }

View the logs:

[2015-09-14 03:59:14.034][trace][24530][107] RTMP client ip=127.0.0.1
[2015-09-14 03:59:14.034][trace][24530][107] srand initialized the random.
[2015-09-14 03:59:14.037][trace][24530][107] complex handshake success


[2015-09-14 03:59:26.236][trace][24530][100] reload config success.


[2015-09-14 04:01:09.240][trace][24577][107] http hook on_connect success. client_id=107, url=http://127.0.0.1:8085/api/v1/clients, request={"action":"on_connect","client_id":107,"ip":"127.0.0.1","vhost":"__defaultVhost__","app":"live","tcUrl":"rtmp://127.0.0.1:1935/live...vhost...__defaultVhost__","pageUrl":"http://www.ossrs.net/players/srs_player.html?vhost=__defaultVhost__&app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true"}, response=0, ret=0
[2015-09-14 04:01:09.240][trace][24577][107] connect app, tcUrl=rtmp://127.0.0.1:1935/live...vhost...__defaultVhost__, pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=__defaultVhost__&app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true, swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.26, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, args=null
[2015-09-14 04:01:09.240][trace][24577][107] out chunk size to 30000

It can be seen that the released config is indeed being used.

TRANS_BY_GPT3

Therefore, whenever config is used and there is a context switch, the required data should be directly copied. For example:



    // the http hooks will cause context switch,
    // so we must copy all hooks for the on_connect may freed.
    // @see https://github.com/simple-rtmp-server/srs/issues/475
    vector<string> hooks;


    if (true) {
        SrsConfDirective* on_connect = _srs_config->get_vhost_on_connect(req->vhost);


        if (!on_connect) {
            srs_info("ignore the empty http callback: on_connect");
            return ret;
        }


        hooks = on_connect->args;
    }


    for (int i = 0; i < (int)hooks.size(); i++) {
        std::string url = hooks.at(i);
        if ((ret = SrsHttpHooks::on_connect(url, req)) != ERROR_SUCCESS) {
            srs_error("hook client on_connect failed. url=%s, ret=%d", url.c_str(), ret);
            return ret;
        }
    }

TRANS_BY_GPT3

I was thinking that I could call copy when getting this config, make a copy and then return a pointer, and then free it after using it in the function.
Now I think the copy you mentioned is better, I will try to make the changes, thank you.

TRANS_BY_GPT3

This impact is extensive, both the DVR and HLS parts, all callbacks have this issue.
Thank you for reporting this bug~

TRANS_BY_GPT3

I have already fixed it, you can refer to my submission.

'
Make sure to maintain the markdown structure.

TRANS_BY_GPT3