WanCaiYan/PSST

PSST! Prosodic Speech Segmentation With Transformers

PSST can be acessed through the transformers module:

pip install transformers

Load the pretrained checkpoint:

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

processor = AutoProcessor.from_pretrained("NathanRoll/psst-medium-en")
model = AutoModelForSpeechSeq2Seq.from_pretrained("NathanRoll/psst-medium-en")

Load sample audio file:

import librosa
y, sr = librosa.load('gettysburg.wav')
audio = librosa.resample(y, orig_sr=sr, target_sr=16000)

Define transcript generation function:

def generate_transcription(audio, gpu=False):
  """Generate a transcription from audio using a pre-trained model

  Args:
    audio: The audio to be transcribed
    gpu: Whether to use GPU or not. Defaults to False.

  Returns:
    transcription: The transcribed text
  """
  # Preprocess audio and return tensors
  inputs = processor(audio, return_tensors="pt", sampling_rate=16000)

  # Assign inputs to GPU or CPU based on argument
  if gpu:
    input_features = inputs.input_features.cuda()
  else:
    input_features = inputs.input_features

  # Generate transcribed ids
  generated_ids = model.generate(inputs=input_features, max_length=250)

  # Decode generated ids and replace special tokens
  transcription = processor.batch_decode(
      generated_ids, skip_special_tokens=True, output_word_offsets=True)[0].replace('!!!!!', '<|IU_Boundary|>')
  
  return transcription

Generate transcription:

generate_transcription(audio, gpu=True)

WanCaiYan / PSST

PSST! Prosodic Speech Segmentation With Transformers

About

Languages