Giving voice to embedded designs
Adding natural voice to embedded designs is easy. Steven Bible of Microchip Technology explains.
Adding voice to an embedded project can enhance the user experience of a product. Commands can be confirmed, the status can be announced and temperatures can be read aloud. However, adding voice has been perceived as a daunting task which is both difficult and expensive. This article demonstrates that using an 8-bit PIC microcontroller with a pulse width modulation (PWM) peripheral can provide a low-cost and easy route to adding voice to an embedded project.
One method of encoding speech is called Adaptive Differential Pulse Code Modulation (ADPCM), a technique which digitises analogue signals. ADPCM takes advantage of the high correlation between consecutive speech samples and encodes the difference between a predicted sample and the speech sample. When played back, or decoded, future samples are predicted. ADPCM provides an efficient compression with quality speech playback.
There are various types of ADPCM algorithms. The Interactive Multimedia Association’s (IMA) algorithm significantly reduces the mathematical complexity by simplifying many of the operations and using table lookups where appropriate, making it a good choice for 8-bit microcontrollers. Since playback is the primary objective, a PC programme will be used for encoding, whilst decoding duties will be handled by the microcontroller.
To make playback interactive, the voice snippets are separated into individual, addressable files. For example, to speak a numeric value for temperature, the numbers one through nine, ten through nineteen, twenty, thirty, forty, fifty, sixty, seventy, eighty, and ninety are recorded in separate files. So, when the temperature is 21 degrees the voice will speak two files one after the other: twenty-one. A simple file system is used to store and retrieve the individual voice files.
The amount of memory needed to store the voice files depends on the number of bits, sample rate, and the amount stored. For toll-quality sound, the number of bits is 16 at a rate of 8000 samples per second, which equates to a 4000 Hz bandwidth. Therefore, the size of one second of voice is 16,000 bytes.
Once the voice file is encoded with the IMA ADPCM algorithm, the size compresses to
1/4 its original size. Depending on the amount of voice needed for a project, it can be stored in the programme memory of the microcontroller or an external serial Flash memory. A one megabit (128 Kbytes) serial Flash memory can hold approximately 32 seconds of voice.
The flow diagram shown in Figure 1 summarises the steps taken. First, the voice is recorded on a PC as a WAV file. Second, using a sound editing programme, the original voice file can be trimmed and re-sampled to 8000 Hz, then saved as an unsigned, 16-bit, little-endian mono file. Third, encode the file using the IMA ADPCM algorithm and save it as a binary file. Fourth, collect all the files together in a file system. Finally, store the files into the microcontroller or external memory.
The hardware for this system is shown in Figure 2. The microcontroller addresses the voice file for playback from memory, and decodes the file using the PWM module. The output of the PWM module is low-pass filtered at a 4000 Hz band pass. The resulting analogue signal can be amplified and played through a speaker.
With a little effort in recording voices, encoding them in ADPCM format and storing them in memory, an embedded project can indeed have a natural voice. But it does not stop there. Since the files are merely recordings, chimes, tones and buzzing sounds can be introduced. The only limit to enhancing the user experience of an embedded design is the engineer’s imagination.
LATEST issue 1/2019