oelmekki / yt-ttml2txt

Convert Youtube's TTML subtitle files to plain text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

yt-ttml2txt

Note to GitHub users : development is happening on GitLab, please go there if you want to open issues or submit merge request.

Converts a Youtube TTML subtitle file to a text file.

Install

make                                 # build with gcc
sudo make install                    # install in /usr/local/bin/

# Alternatives
make CC=clang                        # build with an other compiler, here clang
sudo make install PREFIX=/usr/bin/   # install in an other place, here /usr/bin/

Usage

yt-ttml2txt [-1] <file>

Converts a TTML subtitle file to a text file, written on STDOUT.

Options:

`-1`: print all in one line (eg to facilitate grepping).

Note that this is very simple parsing tested only against Youtube's TTML files and probably only working with them.

My goal is not to fully support any valid TTML file, just what Youtube produces. If this simple parsing turns out to be unstable, I'll rewrite it into a full blown AST parser.

Why?

I wrote this to be able to dump text content for Youtube videos and then grep them, providing local full text search for Youtube videos I care about. Here is how I do it (you need yt-dlp or similar for that):

yt-dlp --skip-download --write-auto-sub --sub-format ttml <youtube-video-url>
yt-ttml2txt -1 <ttml-file> > <cache_dir>/<ttml-file.txt>
grep -r "<your query>" <cache_dir>/

You now can grep your favorite videos content locally for the low price of a few text files in storage.

Credits

This idea came after reading Jeff Atwood mention his use of Youtube subtitles to access content.

About

Convert Youtube's TTML subtitle files to plain text

License:GNU General Public License v3.0


Languages

Language:C 65.2%Language:Makefile 30.8%Language:Shell 4.1%