Forced alignment using Wav2Vec2
pip install git+https://github.com/weisrc/strong-align.git
⚠️ Warning: This package is still in development. The API may change in the future.
import torchaudio
from strong_align import align
text = "Hello world! This is a test."
audio, sr = torchaudio.load("test.wav")
audio = audio[0]
audio = torchaudio.transforms.Resample(sr, 16000)(audio)
audio = audio.to("cuda") # or keep it on the CPU
out = align(text, audio, "en", on_progress=print)
print(out)
You can use your own normalization function by passing it to the align
function.
Special use case:
- Romanizing the text to use the English model for languages such as Chinese, Japanese, Korean, etc.
from strong_align.preprocess import NORMALIZE_FUNCS
def my_normalize_normalize(text, mappings, language, labels):
# do something with text and mappings
return text, mappings
out = align(text, audio, "en",
normalize_func=NORMALIZE_FUNCS+[my_normalize_normalize])
Please refer to the
normalize.py
file for examples of normalization functions.
MIT. Wei (weisrc)