Consume WebVTT live from a TCP socket

Question

Consume WebVTT live from a TCP socket

jadarve opened this issue 4 months ago · comments

Juan David Adarve commented 4 months ago

Thanks for reporting your issue. Please make sure these boxes are checked before submitting your issue - thank you!

I looked for a similar issue and couldn't find any.
I tried with the latest version of GPAC. Installers available at https://gpac.io/downloads/gpac-nightly-builds/
I give enough information for contributors to reproduce my issue (meaningful title, github labels, platform and compiler, command-line ...).

Detailed guidelines: https://gpac.io/bug-reporting/

I want to push a subtitles track into GPAC in live mode to generate a live HLS/DASH stream. Currently, the subtitles are pushed via a TCP socket from another application (see below).

The GPAC command is:

gpac \
  -i tcp://0.0.0.0:9091:gpac:tsprobe=false:listen=true txtin  \
  -o res/live.mpd:dual:cmaf=cmf2:segdur=2:rawsub=true:dmode=dynamic -graph

From the command I expect GPAC to start creating the playlist and segments live while the VTT text arrives from the socket. What I see is that there are no files created in the res folder. And, when I stop the Python client (and the socket is closed), GPAC starts creating the playlist and segments as I expect them to be.

A given segment 0.0.0_dash1.vtt looks like

WEBVTT

00:00.000 --> 00:01.000
Current time: 00:00.000

00:01.000 --> 00:02.000
Current time: 00:01.000

Am I missing something from the TCP connection to signal the dasher to produce the subtitles track live?

Python client

This is the code for the subtitles_client.py client.

To run it:

python3 subtitles_client.py

which creates the WebVTT track, pushes it to the TCP socket, and prints the pushed string:

WEBVTT

00:00.000 --> 00:01.000
Current time: 00:00.000

00:01.000 --> 00:02.000
Current time: 00:01.000

00:02.000 --> 00:03.000
Current time: 00:02.000

00:03.000 --> 00:04.000
Current time: 00:03.000

00:04.000 --> 00:05.000
Current time: 00:04.000

00:05.000 --> 00:06.000
Current time: 00:05.000

00:06.000 --> 00:07.000
Current time: 00:06.000

00:07.000 --> 00:08.000
Current time: 00:07.000

import socket
import threading
import time

def div_remainder(val, div):
    
    return int(val // div), int(val % div)
    
def time_format(milliseconds):
    
    s, ms = div_remainder(milliseconds, 1000)
    m, s = div_remainder(s, 60)
    h, m = div_remainder(m, 60)
    
    if h == 0:
        return '{0:02d}:{1:02d}.{2:03d}'.format(m, s, ms)
    else:
        return '{0:02d}:{1:02d}:{2:02d}.{3:03d}'.format(h, m, s, ms)

def subtitles_thread():
    print("subtitles_thread started")
    
    conn = socket.create_connection(('localhost', 9091))
    
    header = "WEBVTT\n\n"
    conn.send(bytes(header, "utf-8"))
    print(header, end='')
    
    template = "{0} --> {1}\n{2}\n\n"
    
    counter_ms = 0
    delta_ms = 1000
    while True:
        
        curr_time = time_format(counter_ms)
        next_time = time_format(counter_ms + delta_ms)
        msg = 'Current time: {0}'.format(curr_time)
        
        s = template.format(curr_time, next_time, msg)
        conn.send(bytes(s, 'utf-8'))
        print(s, end='')
        
        counter_ms += delta_ms
        time.sleep(delta_ms / 1000.0)
    
    conn.close()
    
    print("subtitles_thread finished")


if __name__ == '__main__':
    
    t1 = threading.Thread(target=subtitles_thread)

    print('STARTING')
    t1.start()

    print('JOINING')
    t1.join()

Jean Le Feuvre · Answer 1 · Tue Apr 09 2024 22:35:52 GMT+0800 (China Standard Time)

This was not supported but should now work with master. Thanks for the test code

Some additional considerations: the srt/vtt parser is a line-based parser so processing packets coming from TCP is dangerous as the framing is unknown - for small real-time subs it should not be a problem but if the sender is not regulating there is no guarantee that what is read from TCP is always a complete number of lines, resulting in parsing errors...

The proper way to do this would be through pipes or using the (just reworked) ka option of the TCP input socket filter. This requires an open/close at each chunk (N full lines) sent but will avoid any undesired side effects.