Table of Contents
- Hacker Public Radio Podcast
- Installing Both Components
- The OMX Library
Currently the best solution for text-to-speech on the Raspberry Pi is the
speech-dispatcher program can use other software speech synthesisers, but historically we have
only used the
speakup screen-reader so far and the best way to connect
speakup to a
text-to-speech engine is with the
espeakup program, not using
This page does not go into the intense debate that rages about the
pros and cons of
eSpeak. Some claim it sounds horrible. But because
of it’s small size and huge language support it remains the best
All was well with
eSpeak text-to-speech on the Raspberry Pi until approximately April 2013.
At about that time, a change was made to the
ALSA driver to introduce real DMA.
Unfortunately, the above change broke
eSpeak tts. After that time, it would stutter very, very
badly within a few seconds of beginning to speak.
If that was not bad enough, a kernel oops regularly occurs when
I established that this was what was happening when the console froze by
connecting the Pi to another Linux machine via the console which runs
UART available on the
GPIO bus. Using this it is possible to see what happens when the
kernel oops occurs and the debug information is sent out of the
This is caused by the Video Core Hardware Interface Queue (VCHIQ) process passing the kernel a null pointer.
VCHIQ process is responsible for queueing audio and video into
the Graphics Processing Unit (GPU).
In efforts to get around this I tried a lot of things:
speakupin the console
- Changing the niceness and the priority of the
- Many other tweaks
I think it was late in 2013 I learned about the
OpenMAX library and
Integration Layer library.
OpenMAX, hereinafter referred to as
OMX, is a software system that
seeks to standardise the interface to graphics and audio hardware on
I believe it was created to make graphics and sound programming easier for the multitude of mobile devices and smart phones now in use.
In a standard Raspbian Raspberry Pi, there is a directory:
This directory contains libraries and other code that relates to the
video core, hence
Note at this point that the term
video core should not be mistaken
for something that only refers to video, for the
GPU also renders
In the above directory I found examples of code that renders sound on the GPU.
So I set about learning how to interface to the
OMX Integration Layer
One of the great things about
eSpeak is it’s ability to be used in a
mode that will return
pulse-code modulated (PCM) audio data to the
In this mode, a fragment of text is passed to
eSpeak to be rendered
into synthetic speech, and the converted
PCM is returned in a
PCM data can then be used however it is required, for example it can be written to a
file, some other file, or processed in some way and then passed to some mechanism to be played over
the output device.
So, what I had to do was:
- Write code that would receive
PCMdata and queue it into the
OMX Integration Layer Client.
eSpeakin it’s callback mode and link my
OMXILClibrary with it.
So, I wrote the library. It took a long time and an even longer time to debug.
It taught me a lot about concurrency and the so called producer/consumer problem, in which different threads of execution either produce data, or consume it.
Concurrency can be defined as the need for different threads of execution, or even different processes, to have access to a common resource without “treading on each other’s toes”.
Then I forked the code of
espeakup and created
piespeakup contains a
callback function which then queues the
text-to-speech audio returned
eSpeak into the
OMX library I wrote contains a circular buffer which receives this data, which is constantly
filled and drained in the classic
piespeakup being the producer
PCM audio, and the
OMX library passing the
TTS audio to
VCHIQ as the consumer.
Result, no involvement from the broken
One of the hairiest problems I had to solve was one of latency. The time taken for the
program to render text into
PCM data and return it to the calling program, and how this impacts
on, in particular, the quality of small chunks of speech, at the beginning and end of each
utterance. For a longertime I had it working but the speech was very severely clipped at the end
of each utterance.
It is not commonly understood that
espeak renders text passed to it in often very small portions rather than as, for example, a whole sentence in one go.
Hacker Public Radio Podcast
Installing Both Components
First you need to install
espeak. If you are using Raspbian, that distro splits the
program from the library and you need to install it like this:
$ sudo apt-get install libespeak-dev
If you are using Arch:
$ sudo pacman -S espeak
Follow these instructions to install the two components which will give stutter-free console speech
In the instructions below I have given the version numbers of both components as
1.0.0, of course
it may change, check the Web site before you start.
The OMX Library
This has been tested on both Raspbian and Arch Linux.
Follow these instructions, in each line the dollar sign represents the prompt. And note that
Downloads starts with a capital ‘D’:
$ wget http://www.raspberryvi.org/Downloads/ilctts-1.0.0.tar.gz $ tar zxf ilctts-1.0.0.tar.gz $ cd ilctts-1.0.0 $ ./configure $ make $ sudo make install $ sudo ldconfig
Now, to load the
SpeakUp kernel modules either reboot before installing the next component or:
$ sudo modprobe speakup_soft
Follow these steps. Again the dollar sign is the prompt:
$ wget http://www.raspberryvi.org/Downloads/piespeakup-1.0.0.tar.gz $ tar zxf piespeakup-1.0.0.tar.gz $ cd piespeakup-1.0.0 $ ./configure $ make $ sudo make install
Note that in both instances of
wget above the word Downloads has a
At this point before we enable and start the `piespeakup` service we have to make sure the `speakup` kernel modules are loaded or the service won't start. If you either rebooted after installing the OMX library or manually instaled the kernel modules they will be there. Check like this: $ lsmod | grep speakup You should see two modules, `speakup_soft` and it's dependancy `speakup`. Now enable `piespeakup`: $ sudo systemctl enable piespeakup And start it: $ sudo systemctl start piespeakup
There is a file at:
In this file is where we can change from the analogue audio jack (3.5mm socket) to HDMI.
The file contains the line:
Change it to:
To switch to HDMI, after which you will need to restart the
DO NOT remove the line above this which reads:
Because the original
ExecStart directive needs to be blanked before it is reset.
I will add more configuration options at a later date.
When both the OMX library and
piespeakup are installed, and when the
speakup kernel modules are loaded correctly, the Pi should come up
speaking when it is rebooted.
It is interesting to note that this console audio does NOT use the
ALSA driver so it does not suffer from the classic problem for
accessibility on the Linux desktop, which is that when the user logs
into the desktop and
speech-dispatcher is configured to use
pulseaudio console audio is silenced because of some configuration
pulseaudio, a solution for which I have never seen.
Note also that there is currently a bug in the
sd_espeak which causes it to crash regularly, which
makes it impossible to reliably use
speech-dispatcher configured for
This does not matter to use currently since the
will also stutter very badly. I need to write an OMX version os the
sd_espeak module or an OMX audio driver for