ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.

Home Page:https://ossrs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DVR: Support DVR as MP4 file.

winlinvip opened this issue · comments

MP4 is the foundation of #174 HTTP-MP4 and also the foundation of #299 MPEG-DASH.

TRANS_BY_GPT3

isom

The brand isom is used by ffmpeg. We choose isom for DVR mp4.

As mp4 demuxer added to srs-librtmp, we also write a srs_ingest_mp4 to ingest mp4 file to rtmp stream. Because we only support live streaming, so we don't seek the mp4 file here and there, that means the mdat box should after moov, or mux in fmp4 brand.

FFMPEG always put mdat before moov, which make the loading very slow(must seek to tail of file):

ffmpeg -i source.200kbps.768x320.flv -c copy -y avatar.mp4

The srs_ingest_mp4 will seek moov when it's not non-seek mp4 file:

./objs/srs_ingest_mp4 -i avatar.mp4 -y rtmp://127.0.0.1/live/livestream

We can use the parameter to make sure moov at the beginning of file by -movflags faststart:

ffmpeg -i source.200kbps.768x320.flv -movflags faststart -c copy -y avatar_faststart.mp4

FFMPEG support fmp4 by -movflags frag_keyframe:

ffmpeg -i source.200kbps.768x320.flv -movflags frag_keyframe -c copy -y avatar_fragment.mp4

wechatimg14722

In the avc1/avcC box, the sps and pps are located:

(lldb) x/41xb avc_config
0x102000000: 0x01 0x64 0x00 0x20 0xff 0xe1 0x00 0x19
0x102000008: 0x67 0x64 0x00 0x20 0xac 0xd9 0x40 0xc0
0x102000010: 0x29 0xb0 0x11 0x00 0x00 0x03 0x00 0x01
0x102000018: 0x00 0x00 0x03 0x00 0x32 0x0f 0x18 0x31
0x102000020: 0x96 0x01 0x00 0x05 0x68 0xeb 0xec 0xb2
0x102000028: 0x2c

Defined in:

 * 5.3.4 AVC Video Stream Definition (avcC)
 * ISO_IEC_14496-15-AVC-format-2012.pdf, page 19

This data is the data in the RTMP Sequence Header, which is the data after removing the RTMP or FLV encapsulation.

// @see: E.4.3 Video Tags, video_file_format_spec_v10_1.pdf, page 79
E.4.3.2 AVCVIDEOPACKET

The specific data is the same as the one mentioned above for MP4.

(lldb) p avc_extra_size
(int) $5 = 41
(lldb) x/41xb avc_extra_data
0x101300390: 0x01 0x64 0x00 0x20 0xff 0xe1 0x00 0x19
0x101300398: 0x67 0x64 0x00 0x20 0xac 0xd9 0x40 0xc0
0x1013003a0: 0x29 0xb0 0x11 0x00 0x00 0x03 0x00 0x01
0x1013003a8: 0x00 0x00 0x03 0x00 0x32 0x0f 0x18 0x31
0x1013003b0: 0x96 0x01 0x00 0x05 0x68 0xeb 0xec 0xb2
0x1013003b8: 0x2c

Reference:

SrsAvcAacCodec::avc_demux_sps_pps

The detailed definition of SPS/PPS is available at:

    // AVCDecoderConfigurationRecord
    // 5.2.4.1.1 Syntax, ISO_IEC_14496-15-AVC-format-2012.pdf, page 16

TRANS_BY_GPT3

In the mp4a/esds box, there is an AudioSpecificConfig for AAC. However, it is 51 bytes long, but in reality, only 2 bytes are ASC data. Please refer to SRS for parsing RTMP data.

srs`SrsAvcAacCodec::audio_aac_demux() at srs_kernel_codec.cpp:508
(lldb) p aac_extra_size
(int) $2 = 2
(lldb) x/2xb aac_extra_data
0x1010012b0: 0x12 0x10

The esds (0x65736473) data in MP4:

(lldb) p/x 'esds'
(int) $47 = 0x65736473
(lldb) x/51xb buf->p
0x100382e85: 0x00 0x00 0x00 0x33 0x65 0x73 0x64 0x73
0x100382e8d: 0x00 0x00 0x00 0x00 0x03 0x80 0x80 0x80
0x100382e95: 0x22 0x00 0x02 0x00 0x04 0x80 0x80 0x80
0x100382e9d: 0x14 0x40 0x15 0x00 0x00 0x00 0x00 0x00
0x100382ea5: 0x75 0x51 0x00 0x00 0x75 0x51 0x05 0x80
0x100382ead: 0x80 0x80 0x02 0x12 0x10 0x06 0x80 0x80
0x100382eb5: 0x80 0x01 0x02

It is located a little further back.

Note that ESDS is not defined in ISO_IEC_14496-12-base-format-2012.pdf, but it is defined in ISO_IEC_14496-14-MP4-2003.pdf:

aligned(8) class ESDBox 
```python
extends FullBox('esds', version = 0, 0) {

Translates to:

extends FullBox('esds', version = 0, 0) {

ES_Descriptor ES;
}


```plaintext
而`ES_Descriptor`定义在`ISO_IEC_14496-1-System-2010.pdf`:

Translates to:

The `ES_Descriptor` is defined in `ISO_IEC_14496-1-System-2010.pdf`:
7.2.6.5 ES_Descriptor

class ES_Descriptor extends BaseDescriptor : bit(8) tag=ES_DescrTag { 
bit(16) ES_ID;
参考SRS解析的代码`SrsMp4EsdsBox::decode_header`。

Translates to:

Refer to the code `SrsMp4EsdsBox::decode_header` for SRS parsing.
注意,ES_Descriptor的数据如下:

Translates to:

Note, the data for ES_Descriptor is as follows:
0x03 0x80 0x80 0x80 0x22 0x00 0x02 0x00
0x04 0x80 0x80 0x80 0x14 0x40 0x15 0x00 0x00 0x00 0x00 0x00 0x75 0x51 0x00 0x00 0x75 0x51 
0x05 0x80 0x80 0x80 0x02 0x12 0x10 
0x06 0x80 0x80 0x80 0x01 0x02
应该解析为:

Translates to:

Should be parsed as:
0x03 # ES_DescrTag
```plaintext
0x80 0x80 0x80 0x22 # Size=0x22, 参考sizeOfInstance

Translates to:

0x80 0x80 0x80 0x22 # Size=0x22, reference sizeOfInstance

0x00 0x02 # ES_ID
0x00 # streamDependenceFlag|URL_Flag|OCRstreamFlag|streamPriority

0x04 0x80 0x80 0x80 0x14...... # DecoderConfigDescrTag, Size=20
0x05 # DecSpecificInfoTag
0x80 0x80 0x80 0x02 # Size=2

0x12 0x10 # ASC,也就是AAC的Specfic Config,解码头

Translates to:

0x12 0x10 # ASC, which is AAC's Specific Config, decoding header

0x06 # SLConfigDescrTag

```plaintext
'BaseDescriptor除了8bits的tag,还有个1-3字节的变长的size,在expandable中描述的,这个非常不明显。'

Translates to:

'In addition to the 8-bit tag, BaseDescriptor also has a variable-length size of 1-3 bytes, described in expandable, which is not very obvious.'
'可以看到SRS解析除了ASC:'

Translates to:

'As we can see, SRS analysis includes ASC:'
(lldb) p/x *decSpecificInfo
(SrsMp4DecoderSpecificInfo) $11 = {
  SrsMp4BaseDescriptor = (tag = 0x00000005, vlen = 0x00000002, start_pos = 0x0000001f)
  nb_asc = 0x00000002
  asc = 0x00000001005002e0 "\x12\x10"
}

asc = 0x12 0x10

TRANS_BY_GPT3

Three ways to mux MP4, that is, three ways to write MP4:

  1. General MP4, box mode is ftyp-mdat-moov. The header is at the end, which is the default output of FFMPEG and is relatively easy to write. When reading, seeking is required; therefore, some browsers may need to fully download before playback can start, while others may be able to use HTTP RANGE to skip mdat and read moov. This mode can be used for SRS recording of MP4 files.
  2. Faststart MP4, box mode is ftyp-moov-mdat. The header is at the beginning, and FFMPEG needs to specify parameters. After generating the MP4, the file needs to be processed again to be supported. Compared to the previous mode, this mode is more friendly to some browsers, but it requires processing the file again. For SRS recording of MP4 files, this can easily cause IO blocking (waiting time is too long, causing the service thread to hang, depending on the implementation of ST, it is not possible to perform CPU or disk operations for a long time).
  3. Fragmented MP4, or FMP4, box mode is ftyp-moov-moof-mdat. This is a segmented mode, and some browsers can play it directly in HTML5, while others cannot. This mode is generally used in DASH, not directly played by browsers, but parsed by JS and played by MSE (APPEND to the Buffer of the Video object, please refer to the article on MSE for details). FFMPEG also needs to specify parameters to generate this mode. This mode is suitable for streaming media, and SRS can use this mode to generate DASH.

MP4 demuxer also has the same several ways mentioned above. SRS's srs ingest mp4 supports all three ways. If it is General MP4, it will use SEEK to skip MDAT directly, which is a time-saving process and does not require processing the file again like when generating MP4.

TRANS_BY_GPT3

MP4 STSD

We can calculate the offset of each sample, by stco, stsz and stsc.

Regarding DTS and PTS, you can use srs_rtmp_dump to capture the correct sequence. Enable ATC in SRS.

./objs/research/librtmp/srs_rtmp_dump -r rtmp://127.0.0.1:1935/live/livestream

Enable ATC.

vhost __defaultVhost__ {
    play {
        atc on;
    }
}

DTS is in STTS, the calculation formula is:

DTS(n+1) = DTS(n) + STTS(n)

CTS/PTS is in CTTS, the calculation formula is:

CTS(n) = DTS(n) + CTTS(n)
PTS(n) = CTS(n)

You can see the sequence of DTS and PTS. If the audio is delayed or advanced compared to the video by a certain time, an adjustment needs to be made, for example, 34 milliseconds. You can calculate the maximum positive difference and maximum negative difference between these two streams. Refer to SrsMp4SampleManager::load.

(lldb) p maxp
(int32_t) $0 = 0
(lldb) p maxn
(int32_t) $1 = -34

After adjustment, the DTS will be mixed with a single increment.

[2017-02-05 21:08:33.19][38133] Video packet id=5/0.0/0.0, type=Video, dts=0, pts=0, ndiff=1628, diff=0, size=46, H.264(SH,I), (0x17 0x00 0x00 0x00 0x00 0x01 0x64 0x00)
[2017-02-05 21:08:33.19][38133] Audio packet id=6/271.3/3.7, type=Audio, dts=0, pts=0, ndiff=0, diff=0, size=4, AAC(44KHz,16bit,Stereo,SH), (0xaf 0x00 0x12 0x10 )
[2017-02-05 21:08:33.19][38133] Video packet id=7/232.6/4.3, type=Video, dts=0, pts=80, ndiff=0, diff=0, size=5137, H.264(Nalu,I), (0x17 0x01 0x00 0x00 0x50 0x00 0x00 0x02)
[2017-02-05 21:08:33.19][38133] Audio packet id=8/203.5/4.9, type=Audio, dts=33, pts=33, ndiff=0, diff=0, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)
[2017-02-05 21:08:33.19][38133] Video packet id=9/180.9/5.5, type=Video, dts=40, pts=120, ndiff=0, diff=7, size=132, H.264(Nalu,P/B), (0x27 0x01 0x00 0x00 0x50 0x00 0x00 0x00)
[2017-02-05 21:08:33.19][38133] Audio packet id=10/162.8/6.1, type=Audio, dts=57, pts=57, ndiff=0, diff=17, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)
[2017-02-05 21:08:33.19][38133] Video packet id=11/148.0/6.8, type=Video, dts=80, pts=280, ndiff=0, diff=23, size=989, H.264(Nalu,P/B), (0x27 0x01 0x00 0x00 0xc8 0x00 0x00 0x03)
[2017-02-05 21:08:33.19][38133] Audio packet id=12/135.7/7.4, type=Audio, dts=80, pts=80, ndiff=0, diff=0, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)
[2017-02-05 21:08:33.19][38133] Audio packet id=13/125.2/8.0, type=Audio, dts=103, pts=103, ndiff=0, diff=23, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)
[2017-02-05 21:08:33.19][38133] Video packet id=14/116.3/8.6, type=Video, dts=120, pts=200, ndiff=0, diff=17, size=55, H.264(Nalu,P/B), (0x27 0x01 0x00 0x00 0x50 0x00 0x00 0x00)
[2017-02-05 21:08:33.19][38133] Audio packet id=15/108.5/9.2, type=Audio, dts=126, pts=126, ndiff=0, diff=6, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)
[2017-02-05 21:08:33.19][38133] Audio packet id=16/101.8/9.8, type=Audio, dts=150, pts=150, ndiff=0, diff=24, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)
[2017-02-05 21:08:33.19][38133] Video packet id=17/95.8/10.4, type=Video, dts=160, pts=160, ndiff=0, diff=10, size=62, H.264(Nalu,P/B), (0x27 0x01 0x00 0x00 0x00 0x00 0x00 0x00)
[2017-02-05 21:08:33.19][38133] Audio packet id=18/90.4/11.1, type=Audio, dts=173, pts=173, ndiff=0, diff=13, size=89, AAC(44KHz,16bit,Stereo,Raw), (0xaf 0x01 0x21 0x11 0x45 0x00 0x14 0x50)

TRANS_BY_GPT3

Fixed. The MP4 DVR by SRS:

srs

For sample data in mdat, it's RAW AAC frame, or RAW AVC frame. We can remove the FLV TAG header, and remove the AUDIO TAG HEADER to get the AAC RAW frame, or remove the VIDEO TAG HEADER to get the AVC RAW frame.

MP4, dvr pulls the live stream, performs recording, and resumes recording after pausing the live stream. However, dvr cannot continue generating files.

https://github.com/ossrs/srs/issues/1912

TRANS_BY_GPT3