MathiasRLuz / VVCD_VVCMD

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VVCD & VVCMD

VVCD (Vehicle Voice Commands Dataset) and VVCMD (Vehicle Voice Commands Mixed Dataset) are publicly available datasets for Automatic Speech Recognition (ASR) in the Brazilian Portuguese language.

These datasets were created for developing and testing a Proof of Concept for controlling the vehicle's exterior lights using only voice commands. The project focused on controlling the vehicle's dashboard by voice commands for drivers with reduced mobility.

The video below shows a test of the developed voice assistant.

Vehicle.Voice.Assistant.mp4

You can see more content on: Vehicle Voice Assistant

The chosen voice commands are presented below with their english version:

Brazilian Portuguese English Vehicle Function
Seta para direita Right turn signal Turns on/off the right turn signal
Seta para esquerda Left turn signal Turns on/off the left turn signal
Luz baixa Headlights Turns on/off the headlights
Pisca alerta Hazard warning Turns on/off the hazard warning lights

Differences between the two datasets

  • VVCD contains 204 audio files and their respective transcriptions, which were automatically validated by Python's SpeechRecognition library using the Google Speech Recognition engine.
  • VVCMD contains, in addition to the VVCD audios, 192 synthetic voice audios from Microsoft Azure, totaling 396 audio files with their respective transcriptions.

Datasets Folder Structure

  • Voice ID Folders:
    • luz_baixa: Folder for the headlights voice command.
    • pisca_alerta: Folder for the hazard warning voice command.
    • seta_direita: Folder for the right turn signal voice command.
    • seta_esquerda: Folder for the left turn signal voice command.
    • driver.npy: Numpy array with the driver voice embedding, summarizing the audio features from the driver's voice.
  • class_ocurrences.txt: Document containing the distribution of classes for each subset (train, test and dev).
  • dev.csv: Document containing the list of audio files of the dev subset, with their filesize and transcript (Ground Truth - automatically validated).
  • test.csv: Document containing the list of audio files of the test subset, with their filesize and transcript (Ground Truth - automatically validated).
  • train.csv: Document containing the list of audio files of the train subset, with their filesize and transcript (Ground Truth - automatically validated).

Each command folder contains three .wav files of the voice command and a .npy file summarizing the audio features of the voice command.

Related Publications

A Simple Method for Voice Command Recognition with a Minimal Dataset

Design and Development of a Control System for Automotive Vehicles through Voice Commands

(under development)

About