/raspberrypi

Speech-to-text on a Raspberry Pi

A very simple way to do speech-to-text directly on the Raspberry Pi.

This closely follows this but also includes the Pi dependencies:

sudo apt-get install swig oss-compat pulseaudio libpulse-dev automake autoconf libtool bison python-dev

For all benchmarks I recorded one file using

arecord -f S16_LE -c1 -r16000 goforward.raw

To trasncribe the audio, you can use

time pocketsphinx_continuous -fwdflat no -bestpath no -maxwpf 5 -maxhmmpf 10000 -topn 2 -pl_window 7 -infile goforward.raw

Intel(R) Core(TM) i5-4310U CPU @ 2.00GHz: ~1-2 second for 2 seconds audio

Raspberry Pi 2: ~7-8 seconds for 2 seconds audio

Make a dictionary first. Generate a file language.txt with some words, like

open browser
new e-mail
forward
backward
next window
last window
open music player
okay computer

then go generate a dictionary by uploading language.txt. Download the resulting tar and use tar -xvzf TAR*.tgz and then use the command:

time pocketsphinx_continuous -fwdflat no -bestpath no -maxwpf 5 -maxhmmpf 1000 -topn 2 -pl_window 7  -dict 4182.dic -lm 4182.lm -infile goforward.raw

Intel(R) Core(TM) i5-4310U CPU @ 2.00GHz: ~0.1 seconds

Raspberry Pi 2: ~1.4 seconds

pocketsphinx_continuous -kws phrases.kws -kws_threshold 1e-20 -infile goforward.raw

where phrases.kws has a couple of phrases to look for.

Intel(R) Core(TM) i5-4310U CPU @ 2.00GHz: ~0.25 seconds

Raspberry Pi 2: ~2.5 seconds

git clone https://github.com/cmusphinx/sphinxbase.git
git clone https://github.com/cmusphinx/pocketsphinx

cd sphinxbase
./autogen.sh
./configure
make && sudo make install

Add the following to ~/.profile:

export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

then

source ~/.profile
cd ../
cd pocketsphinx
./autogen.sh
make && sudo make install