mbrg / webvtt-to-json

Convert WebVTT to JSON, optionally removing duplicate lines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

webvtt-to-json

PyPI Changelog Tests License

Convert WebVTT to JSON, optionally removing duplicate lines

Installation

Install this tool using pip:

pip install webvtt-to-json

Usage

To output JSON for a WebVTT file:

webvtt-to-json subtitles.vtt

This will output to standard output. Use -o filename to send it to a specified file.

Subtitles can often include duplicate lines. Add -d or --dedupe to attempt to remove those duplicates from the output:

webvtt-to-json --dedupe subtitles.vtt

Use -s or --single to output single "line" keys instead of a "lines" array.

You can also use:

python -m webvtt_to_json ...

Output

Standard output:

[
    {
        "start": "00:00:00.000",
        "end": "00:00:01.829",
        "lines": [
            " ",
            "my<00:00:00.160><c> career</c><00:00:00.480><c> in</c><00:00:00.640><c> side</c><00:00:00.880><c> projects</c><00:00:01.280><c> and</c><00:00:01.520><c> open</c>"
        ]
    }
]

--dedupe output:

[
    {
        "start": "00:00:01.829",
        "end": "00:00:01.839",
        "lines": ["my career in side projects and open"]
    }
]

--dedupe --single output:

[
    {
        "start": "00:00:01.829",
        "end": "00:00:01.839",
        "line": "my career in side projects and open"
    }
]

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd webvtt-to-json
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

About

Convert WebVTT to JSON, optionally removing duplicate lines

License:Apache License 2.0


Languages

Language:Python 100.0%