ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.

Home Page:https://ossrs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HLS: Rewrite HLS(espaciallly audio-only) for aac.

winlinvip opened this issue · comments

The AAC audio stream of HLS needs to recalculate the timestamps, so this part of HLS needs to be rewritten, especially the pure audio part. Currently, there are no issues when mixing HLS audio and video streams, probably because there is video as a reference. However, pure audio streams will have crackling noise, which is caused by the gaps between audio packet aggregation. Apple replied that AAC should be used instead of TS, and the timestamps need to be recalculated.

For a detailed understanding of AAC audio standards and MP3 standards, you can refer to SRS3.

TRANS_BY_GPT3

The problem with the loud sound has been identified. It is caused by the sampling rate, which results in timestamps that cannot be evenly divided, leading to errors. Safari is more accurate, hence the popping sound.

The verification is as follows: first, consider a sampling rate of 8000Hz. An AAC frame consists of 1024 samples, so one AAC frame is:

1024/8000.0=0.128s=128ms
If the sampling rate is 16000Hz, then each AAC frame is: 1024/16000.0 = 0.064s = 64ms.

The SRS configuration is as follows:

listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;
http_api {
    enabled         on;
    listen          1985;
}
http_server {
    enabled         on;
    listen          8080;
}
vhost __defaultVhost__ {
    hls {
        enabled         on;
        hls_vcodec vn;
        hls_dts_directly off; 
    }
}

Transcode using FFMPEG to output audio at 16KHZ.

ffmpeg -re -i doc/source.200kbps.768x320.flv \
-vn -acodec libfdk_aac -ar 16000 -ac 2 -b:a 48k \
-f flv -y rtmp://127.0.0.1/live/livestream

When accessing http://localhost:8080/live/livestream.html with Safari, it can be observed that there is no audio distortion.

When transcoding, output the audio at 44100Hz.

ffmpeg -re -i doc/source.200kbps.768x320.flv \
-vn -acodec libfdk_aac -ar 44100 -ac 2 -b:a 48k \
-f flv -y rtmp://127.0.0.1/live/livestream

You can hear a "popping" or "crackling" noise every 4 seconds or so. It happens with each piece, but you have to listen carefully to notice it.

What is the reason? At 44100Hz, each AAC frame is:

1024/44100.0=0.02321995s=23.21995ms

If rounded, each frame will have an error of 0.2ms. Safari is more sensitive, so problems are more likely to occur.

How to solve this problem? NGINX combines multiple AAC frames into one TS Packet and then calculates the accumulated time. If calculating the time for each frame directly:

90000*1024/44100.0=2089.795918367347

This way, the error can be reduced to 1/90.

For example, the information of an audio is:

(lldb) p audio->timestamp
(int64_t) $8 = 23
(lldb) p audio->timestamp*90
(long long) $9 = 2070

However, the result recalculated based on the number of samples is:

int64_t dts = 90000 * aac_samples / srs_flv_srates[format->acodec->sound_rate];
(lldb) p dts
(int64_t) $6 = 2089

After 200ms:

(lldb) p audio->timestamp
(int64_t) $14 = 209
(lldb) p audio->timestamp*90
(long long) $15 = 18810
(lldb) p dts
(int64_t) $16 = 18808

As a result, there was no more popping sound.

Note: Starting from version 3.0.71, the default value of hls_dts_directly is set to on, which is consistent with SRS2. However, this may cause occasional popping sound issues in HLS. If your stream is not abnormal, it is recommended to set hls_dts_directly to off to optimize the popping sound problem.

TRANS_BY_GPT3

It seems that there is no need to encapsulate pure audio HLS in AAC format, TS is fine.

TRANS_BY_GPT3

Note that although this improvement can prevent HLS audio explosion, it may generate a large number of very small segments in HLS due to timestamp issues. Refer to #1506.

The solution is to add a configuration to disable this improvement and use the original timestamps for direct conversion, which may still result in HLS audio explosion issues.

    hls {
        enabled         on;
        hls_dts_directly on;
    }

Note: Starting from version 3.0.71, the default value of hls_dts_directly is on, which is consistent with SRS2. However, this may sometimes cause audio explosion issues in HLS. If your stream is not abnormal, it is recommended to set hls_dts_directly to off to optimize the audio explosion problem.

Using AAC sampling conversion, without using the original timestamps, there are no HLS audio explosion issues, but there may be abnormal slicing problems:

    hls {
        enabled         on;
        hls_dts_directly off;
    }

TRANS_BY_GPT3