hypha / pocketsphinx

Speech recognition engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PocketSphinx

This repository contains all the prerequisites properly configured in order to use PocketSphinx. We have installed both SphinxBase, as well as PocketSphinx as suggested. This repository can be forked (or directly submoduled), then used as a submodule.

For questions regarding this project, or implementations of it on other projects, feel free to reach out:

Jeffrey Levesque
jeff1evesque@yahoo.com

###Definition

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop.

##Pre-Installation

###Linux Packages

# General Packages:
sudo apt-get update
sudo apt-get install libtool
sudo apt-get install autoconf
sudo apt-get install bison
sudo apt-get install swig2.0

Note: This project assumes Ubuntu Server 14.04 as the operating system.

##Configuration

###GIT

Fork this project on your GitHub account, then clone it:

cd /var/www/html/
sudo git clone https://jeff1evesque@github.com/[YOUR-USERSNAME]/pocketsphinx.git pocketsphinx

Then, add git upstream reference:

cd /var/www/html/pocketsphinx/
git remote add upstream https://github.com/jeff1evesque/pocketsphinx.git

Initialize any submodules we are using:

cd /var/www/html/pocketsphinx/
git submodule init
git submodule update

###File Permission

Change the file permission for the entire project by issuing the command:

cd /var/www/html/
sudo chown -R jeffrey:admin pocketsphinx

Note: change jeffrey to YOUR username.

##Installation / Updates

Since SphinxBase is a dependency for PocketSphinx, both submodules are required.

###SphinxBase

cd /var/www/html/pocketsphinx/sphinxbase/
git checkout master
git pull
./autogen.sh
sudo make install

cd ../
git add sphinxbase
git commit -m "#i: MESSAGE"
git push origin [NEW_BRANCH]

Then, submit a pull-request, and merge the above changes.

###PocketSphinx

Repeat the latter steps tailored for the PocketSphinx submodule.

###SphinxTrain

Repeat the latter steps tailored for the SphinxTrain submodule.

###Acoustic / Language Models

To improve the accuracy of audio translations, we will utilize two additional models. An Acoustic, and Language model:

# Extract Acoustic, and Language Models
cd ../../
wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download -O en-us.tar.gz
sudo tar -zxvf en-us.tar.gz -C /usr/local/share/pocketsphinx/model/hmm/
sudo rm en-us.tar.gz

wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download -O cmusphinx-5.0-en-us.lm.dmp
sudo mv cmusphinx-5.0-en-us.lm.dmp /usr/local/share/pocketsphinx/model/lm/

##Testing / Execution

When everything has been installed, and configured, PocketSphinx is very easy to run. For example, we can run the following commands:

# converts SAMPLE.wav to text within the terminal console
pocketsphinx_continuous -infile SAMPLE.wav

# converts SAMPLE.wav to text while using additional Acoustic, and Language models respectively
pocketsphinx_continuous -infile SAMPLE.wav -hmm en-us -lm en-us.lm.dmp

# converts SAMPLE.wav, and redirects output to sample.txt
pocketsphinx_continuous -infile SAMPLE.wav > sample.txt

###Flags

hmm en-us: loads Acoustic model from the en-us folder

lm en-us.lm.dmp: loads Language model from the file en-us.lm.dmp

###Output

The resulting text translation will look like the following, whether redirected to a text file, or stdout within the terminal console:

000000000: this is a test conversion result using this applcation
000000001: the program processes audio by splitting on the audio into chunks
000000002: the chunks are determined by silence fragments within the audio input file

###Translation Time

The PocketSphinx translation engine ideally should have a translation time (TR) equal to three times the recording time (RT):

TR = 3 x RT

The translation time (TR) can be verified by checking the output from the command pocketsphinx_continuous. This command will output many lines. However, the ones of particular relevance have a very specific form.

###CPU Time

The CPU Time is the actual execution time for the pocketsphinx_continuous command. Therefore, the sum of all such lines will produce the overall CPU Time for the pocketsphinx_continuous command:

ngram_search_fwdtree.c(xxx): TOTAL fwdxxxx xx.xx CPU x.xxx xRTINFO:

###System Time

The Wall Time is the actual system time for the pocketsphinx_continuous command. A system can pause processes for various operations, including those used in relation to pocketsphinx_continuous. Therefore, possibly a better measure of the overall translation time. The sum of all such lines will produce the overall System Time for the pocketsphinx_continuous command:

ngram_search_fwdtree.c(xxx): TOTAL fwdtxxxx xx.xx wall x.xxx

###Translation Accuracy

Depending on whether additional Acoustic, or Lanuguage models are used, the translation accuracy can be significantly influenced. Generally, the Acoustic model describes the sounds of the language, whereas the Language model describes the probability of word sequences. In order to create our own acoustic model, we need to utilize the SphinxTrain submodule in this repository. To create our own language model, we need to use srilm, a non-free licensed tool.

One thing to keep in mind, the Sphinx engine has difficulty translating language fillers, words generally used to express pauses (uh, hmm, err, ahh, etc.) within speech. Language fillers, at times may be ignored, translated into other known words, or distorting the translation of words spoken within close (immediate) proximity of the language filler.

About

Speech recognition engine