xiangrui155 / autosub

Command-line utility for auto-generating subtitles for any video file modified by BingLing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autosub

简体中文

This repo is not the same as the original autosub repo.

This repo has been modified by several people. See the Changelog.

autosub icon designed by BingLingGroup.

Software: inkscape

Font: source-han-sans (SIL)

Color: Solarized

TOC

  1. Description
  2. License
  3. Download and Installation
  4. Usage

Click up arrow to go back to TOC.

Description

Autosub is a utility for automatic speech recognition, subtitle generation based on Google-Speech-v2 or Chrome-Web-Speech-api. It can also translate the subtitle's text by using googleapiclient. Currently not supports the latest Google Cloud APIs.

Input

A video or an audio file. Using ffmpeg to convert the format into the proper format.

Divide

Since this Speech-to-Text api only accept short-form audio which is not longer than 10 to 15 seconds, we need to divide one audio file into many small pieces which contain the speech parts.

Use the average power of a small fragment of sound (4096 frames long, 16000 sample rate is about 0.256 seconds) as the instantaneous power as the intensity to find the speech region.

Or uses external regions from the file that pysubs2 supports like .ass or .srt. This will allow you to manually adjust the regions to get better recognition result.

Speech-to-Text/Translation API request

Makes parallel requests to generate transcriptions for those regions.

  • Post-processing for the subtitle lines may be needed, some of which are too long to hold in a single line at the bottom of the video frame.

(optionally) Translates them to a different language, and finally saves the resulting subtitles to the local storage.

 ↑ 

Speech-to-Text/Translation language support

The Speech-to-Text lang codes are different from the Translation lang codes due to the difference between these two APIs.

To see which, run the utility with the argument -lsc or --list-speech-to-text-codes and -ltc or --list-translation-codes. Or just open constants.py and check.

  • Currently supported lang codes are hard-coded to avoid any inaccurate recognition since if not using the codes on the list but somehow the api accept it, the Google's API recognizes your audio in the ways that depend on your IP address which is uncontrollable by yourself.

Output

Currently suppports .srt, .vtt, .json, .txt(the same as the Aegisub plain text output).

 ↑ 

License

[ATTENTION]: This repo has a different license from the original repo.

GPLv3

Download and Installation

[ATTENTION]: Except the PyPI version, others include non-original codes not from the original repository.

Branches

alpha branch

  • Include many changes from the original repo. Details in Changelog. Codes will update when alpha released. It is stabler than the dev branch

origin branch

  • Include the least changes from the original repo except all new features in the alpha branch. The changes in origin branch just make sure there's no critical bugs when the program running on Windows. Currently not maintained.

dev branch

  • The latest codes will be pushed to this branch. If it works fine, it will be merged to alpha branch when new version released.
  • Only used to test or pull request. Don't install them unless you know what you are doing.

 ↑ 

Install on Ubuntu

[ATTENTION]: Dependency install commands on the first line.

Install from PyPI.

apt install ffmpeg python python-pip -y
pip install autosub

Install from origin branch.

apt install ffmpeg python python-pip git -y
pip install git+https://github.com/BingLingGroup/autosub.git@origin

Install from alpha branch.

apt install ffmpeg python python-pip git -y
pip install git+https://github.com/BingLingGroup/autosub.git@alpha

 ↑ 

Install on Windows

You can just go to the release page and download the latest release for Windows.

  • [ATTENTION]: Current Pre-release for autosub is built by pyinstaller, which means you can feel a little delay when open it but it is normal. A faster version built by nuitka is coming soon.

Or install it from choco.

Choco install command on cmd.

@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"

Install from origin branch.

choco install git python2 curl -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@origin

Install from alpha branch.

choco install git python2 curl -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@alpha

 ↑ 

Usage

For the original autosub usage, see简体中文使用指南.

For the modified alpha branch version, see the help info below.

$ autosub -h
usage: autosub [-h] [-C CONCURRENCY] [-o OUTPUT] [-esr [path]] [-F FORMAT]
               [-S SRC_LANGUAGE] [-D DST_LANGUAGE] [-K API_KEY] [-lf] [-lsc]
               [-ltc] [-htp]
               [source_path]

positional arguments:
  source_path           Path to the video or audio file to subtitle

optional arguments:
  -h, --help            show this help message and exit
  -C CONCURRENCY, --concurrency CONCURRENCY
                        Number of concurrent API requests to make
  -o OUTPUT, --output OUTPUT
                        Output path for subtitles (by default, subtitles are
                        saved in the same directory and name as the source
                        path)
  -esr [path], --external-speech-regions [path]
                        Path to the external speech regions, which is one of
                        the formats that pysubs2 supports and overrides the
                        default method to find speech regions
  -F FORMAT, --format FORMAT
                        Destination subtitle format
  -S SRC_LANGUAGE, --src-language SRC_LANGUAGE
                        Language spoken in source file
  -D DST_LANGUAGE, --dst-language DST_LANGUAGE
                        Desired language for the subtitles
  -K API_KEY, --api-key API_KEY
                        The Google Translation API key to be used. (Required
                        for subtitle translation)
  -lf, --list-formats   List all available subtitle formats
  -lsc, --list-speech-to-text-codes
                        List all available source language codes, which mean
                        the speech-to-text available language codes.
                        [WARNING]: Its name format is different from the
                        destination language codes. And it's Google who make
                        that difference not the developers of the autosub.
                        Reference: https://cloud.google.com/speech-to-
                        text/docs/languages
  -ltc, --list-translation-codes
                        List all available destination language codes, which
                        mean the translation language codes. [WARNING]: Its
                        name format is different from the source language
                        codes. And it's Google who make that difference not
                        the developers of the autosub. Reference:
                        https://cloud.google.com/translate/docs/languages
  -htp, --http-speech-to-text-api
                        Change the speech-to-text api url into the http one

 ↑ 

About

Command-line utility for auto-generating subtitles for any video file modified by BingLing

License:GNU General Public License v3.0


Languages

Language:Python 96.0%Language:Batchfile 4.0%