How to read text aloud with Piper and Python

Scrabble tiles arranged to spell the word “TALK” on a marble surface.
Image by Markus Winkler from Pixabay

Piper is a neural text-to-speech system that can run locally and deliver great sounding audio clips even on underpowered computers. Piper is optimized to run on the Raspberry Pi 4, and you can easily import it to your application as a library.

I stumbled upon Piper TTS after looking for a simple text-to-speech application where I could input text and read it out loud on Linux Minut. Coming from using Microsoft online voices on Microsoft Edge, I was looking for a more natural sound than the robotic voices from programs like Festival or eSpeak, and I was impressed by the natural sound of Piper and its capabilities to run smoothly on almost any kind of modern computer.

You can listen to samples generated using Piper here.

Installing Piper on Linux

You can install Piper through pip with this command:

pip install piper-tts

Afterwards, you should be able to import piper into your program with import piper.

You can quickly test piper from the terminal by piping the output of a program for it to read, like this:

echo "Hello world! This is text to speech" | piper \
--model en_US-lessac-medium \
--output_file audio.wav

The resulting audio will be saved in audio.wav and can be played with any media player.

Adding models to Piper

The Piper repository includes a variety of pre-trained voice models sorted by language that you can use in your projects. These models determine how the synthesized speech will sound, in other words, each model is a different “voice” you can use with Piper (Although occasionally a model will contain multiple voices).

You can also create your own voice model using the Piper Recording Studio, a web application that you can run locally to generate a Piper dataset by recording clips with your voice. However, make sure to have a decent graphics card on your device, or training your model could be a very slow task. For more information on the process of creating a model for Piper, look at this article from Sam Howell.

In order to add models to Piper, you need to obtain the .onnx format and the .onnx.json file. These JSON files contain important metadata about the models, such as their sample rate and phoneme set, and must always have the same name as the .onnx file, and be located within the same directory.

For example:

directory/
|-- ...
|-- es_MX-claude-high.onnx
|-- es_MX-claude-high.onnx.json

Generating audio files from text

To generate audio files programatically using Piper, you’ll need to import the PiperVoice class and use the appropiate methods, like this (based on this answer):

import wave
from piper.voice import PiperVoice

model = "/path/to/model.onnx"
voice = PiperVoice.load(model)
text = "This is an example of text to speech"
wav_file = wave.open("output.wav", "w")
audio = voice.synthesize(text, wav_file)

In this code, we:

  • Import the wave module to create WAV files, and PiperVoice to generate the audio from our text.
  • Specify the model file we want to use and load it into Piper.
  • Create a wav_file object where the program will write the synthesized audio data.
  • Define the text we want to convert to speech.
  • We call the synthesize method of PiperVoice to generate audio from the text and save it to the WAV file.

Streaming text to speech with Piper

It is possible to stream the audio directly to an audio device without having to save it to a file first, as seen in this answer.

import numpy as np
import sounddevice as sd
from piper.voice import PiperVoice

model = "/path/to/model.onnx"
voice = PiperVoice.load(model)
text = "This is an example of text to speech"

# Setup a sounddevice OutputStream with appropriate parameters
# The sample rate and channels should match the properties of the PCM data
stream = sd.OutputStream(samplerate=voice.config.sample_rate, channels=1, dtype='int16')
stream.start()

for audio_bytes in voice.synthesize_stream_raw(text):
    int_data = np.frombuffer(audio_bytes, dtype=np.int16)
    stream.write(int_data)

stream.stop()
stream.close()

If you get an error OSError: PortAudio library not found, you can fix it by installing the portaudio library. You can do this in Ubuntu and Debian-based distributions with this command:

sudo apt-get install libportaudio2

The previous code is similar to the one we used to create WAV files from text. This time, we:

  • Import sounddevice for audio streaming, PiperVoice generate the audio from our text, and numpy to interpret the data as an array.
  • Define the model we want to use and load it into Piper.
  • Provide the text we want to convert to speech.
  • Set up a sounddevice OutputStream with parameters matching the properties of the PCM (Pulse Code Modulation) data produced by Piper. This stream will be used to play the audio generated by Piper.
  • Iterate over the raw audio data generated by voice.synthesize_stream_raw, convert it to an array of integers, and write it to the stream for real-time playback.

Conclusion

In summary, Piper offers a powerful solution for local text-to-speech synthesis.

Although the speech quality is not as high as tools like Coqui, the fact that Piper can generate audio quickly in devices with limited resources make it, in my opinion, the best local text to speech tool currently.

By importing Piper as a library with Python, you can easily integrate it into your programs, and deliver natural sounding voices while barely affecting performance. If you want to see an example, take a look at this simple read aloud program that I wrote with Python and Tkinter.

Update: The original code mistakenly called PiperVoice(model), which was incorrect and didn’t work as intended because the .load() method was missing. The corrected code now calls PiperVoice.load(model). Thanks to everyone who pointed this out! 🙂️

15 comments

  1. this line
    voice = PiperVoice(model)

    results in positional character error “config”

    do you know how to resolve that? I tried looking at the __main__.py ___init__.py of .voice but cannot figure out how to resolve it. I’ve searched forums and found some hits but couldn’t apply what they’re talking about to this particular error for this particular module.

    1. Hello,
      I looked at the code but I could not replicate that error. Do you think you can share more information about your system like your version of Python, your OS, and the traceback output when the error happens?

      1. I’m getting the same error when I try to load a model as well. Any thoughts?
        TypeError: PiperVoice.__init__() missing 1 required positional argument: ‘config’

        code is below:

        import os
        import openai
        from dotenv import load_dotenv
        import time
        import speech_recognition as sr
        import pyttsx3
        from piper.voice import PiperVoice
        import numpy as np
        import sounddevice as sd
        load_dotenv()

        #tried with both .json or just .onnx
        #model = “/home/jedd/jarvis/en_US-lessac-medium.onnx.json”
        model = “/home/jedd/jarvis/en_US-lessac-medium.onnx”

        voice = PiperVoice(model)
        text = “This is an example of text to speech”
        wav_file = wave.open(“output.wav”, “w”)
        audio = voice.synthesize(text, wav_file)

      2. Raspberry Pi 5 (raspberry 64-bit os)
        Python 3.11.2

        Traceback (most recent call last):
        File “/home/jedd/jarvis/jv.py”, line 19, in
        voice = PiperVoice(model)
        ^^^^^^^^^^^^^^^^^
        TypeError: PiperVoice.__init__() missing 1 required positional argument: ‘config’

      3. There’s a omission in the PiperVoice(model) statement. It needs the .load method to work.

        PiperVoice.load(model) is the correct.

  2. Also got positional character error “config”. Resolved by replacing voice = PiperVoice(model) with voice = PiperVoice.load(model)

  3. At least for the latest Piper, this line in your example:
    voice = PiperVoice(model)
    should be changed to:
    voice = PiperVoice.load(model)
    This will eliminate the error that AA was describing.

  4. thanks for the tutorial the github repo doesn’t explain how to use the python piper API at all! works perfect on ubuntu 20.09 python 3.9

  5. the code throws the error “Illegal instruction” when running the line from piper.voice import PiperVoice
    i am using raspberry pi. i don’t seem to understand why this is happening! in the article above it says Piper should run fine on raspberry pi 4

  6. I am getting an AttributeError: ‘PiperVoice’ object has no attribute ‘synthesize_stream_raw’. I have upgraded to the latest version 1.3.0 and this issue persists

Leave a Reply to Darryl Cancel reply

Your email address will not be published. Required fields are marked *