Piper is a neural text-to-speech system that can run locally and deliver great sounding audio clips even on underpowered computers. Piper is optimized to run on the Raspberry Pi 4, and you can easily import it to your application as a library.
I stumbled upon Piper TTS after looking for a simple text-to-speech application where I could input text and read it out loud on Linux Minut. Coming from using Microsoft online voices on Microsoft Edge, I was looking for a more natural sound than the robotic voices from programs like Festival or eSpeak, and I was impressed by the natural sound of Piper and its capabilities to run smoothly on almost any kind of modern computer.
You can listen to samples generated using Piper here.
Installing Piper on Linux
You can install Piper through pip with this command:
pip install piper-tts
Afterwards, you should be able to import piper into your program with import piper.
You can quickly test piper from the terminal by piping the output of a program for it to read, like this:
echo "Hello world! This is text to speech" | piper \
--model en_US-lessac-medium \
--output_file audio.wav
The resulting audio will be saved in audio.wav and can be played with any media player.
Adding models to Piper
The Piper repository includes a variety of pre-trained voice models sorted by language that you can use in your projects. These models determine how the synthesized speech will sound, in other words, each model is a different “voice” you can use with Piper (Although occasionally a model will contain multiple voices).
You can also create your own voice model using the Piper Recording Studio, a web application that you can run locally to generate a Piper dataset by recording clips with your voice. However, make sure to have a decent graphics card on your device, or training your model could be a very slow task. For more information on the process of creating a model for Piper, look at this article from Sam Howell.
In order to add models to Piper, you need to obtain the .onnx format and the .onnx.json file. These JSON files contain important metadata about the models, such as their sample rate and phoneme set, and must always have the same name as the .onnx file, and be located within the same directory.
For example:
directory/
|-- ...
|-- es_MX-claude-high.onnx
|-- es_MX-claude-high.onnx.json
Generating audio files from text
To generate audio files programatically using Piper, you’ll need to import the PiperVoice class and use the appropiate methods, like this (based on this answer):
import wave
from piper.voice import PiperVoice
model = "/path/to/model.onnx"
voice = PiperVoice.load(model)
text = "This is an example of text to speech"
wav_file = wave.open("output.wav", "w")
audio = voice.synthesize(text, wav_file)
In this code, we:
- Import the wave module to create WAV files, and PiperVoice to generate the audio from our text.
- Specify the
modelfile we want to use and load it into Piper. - Create a
wav_fileobject where the program will write the synthesized audio data. - Define the
textwe want to convert to speech. - We call the synthesize method of PiperVoice to generate
audiofrom the text and save it to the WAV file.
Streaming text to speech with Piper
It is possible to stream the audio directly to an audio device without having to save it to a file first, as seen in this answer.
import numpy as np
import sounddevice as sd
from piper.voice import PiperVoice
model = "/path/to/model.onnx"
voice = PiperVoice.load(model)
text = "This is an example of text to speech"
# Setup a sounddevice OutputStream with appropriate parameters
# The sample rate and channels should match the properties of the PCM data
stream = sd.OutputStream(samplerate=voice.config.sample_rate, channels=1, dtype='int16')
stream.start()
for audio_bytes in voice.synthesize_stream_raw(text):
int_data = np.frombuffer(audio_bytes, dtype=np.int16)
stream.write(int_data)
stream.stop()
stream.close()
If you get an error OSError: PortAudio library not found, you can fix it by installing the portaudio library. You can do this in Ubuntu and Debian-based distributions with this command:
sudo apt-get install libportaudio2
The previous code is similar to the one we used to create WAV files from text. This time, we:
- Import
sounddevicefor audio streaming,PiperVoicegenerate the audio from our text, andnumpyto interpret the data as an array. - Define the
modelwe want to use and load it into Piper. - Provide the
textwe want to convert to speech. - Set up a
sounddevice OutputStreamwith parameters matching the properties of the PCM (Pulse Code Modulation) data produced by Piper. Thisstreamwill be used to play the audio generated by Piper. - Iterate over the raw audio data generated by
voice.synthesize_stream_raw, convert it to an array of integers, and write it to thestreamfor real-time playback.
Conclusion
In summary, Piper offers a powerful solution for local text-to-speech synthesis.
Although the speech quality is not as high as tools like Coqui, the fact that Piper can generate audio quickly in devices with limited resources make it, in my opinion, the best local text to speech tool currently.
By importing Piper as a library with Python, you can easily integrate it into your programs, and deliver natural sounding voices while barely affecting performance. If you want to see an example, take a look at this simple read aloud program that I wrote with Python and Tkinter.
Update: The original code mistakenly called PiperVoice(model), which was incorrect and didn’t work as intended because the .load() method was missing. The corrected code now calls PiperVoice.load(model). Thanks to everyone who pointed this out! 🙂️
good and simple
Thank you very much
this line
voice = PiperVoice(model)
results in positional character error “config”
do you know how to resolve that? I tried looking at the __main__.py ___init__.py of .voice but cannot figure out how to resolve it. I’ve searched forums and found some hits but couldn’t apply what they’re talking about to this particular error for this particular module.
Hello,
I looked at the code but I could not replicate that error. Do you think you can share more information about your system like your version of Python, your OS, and the traceback output when the error happens?
I’m getting the same error when I try to load a model as well. Any thoughts?
TypeError: PiperVoice.__init__() missing 1 required positional argument: ‘config’
code is below:
import os
import openai
from dotenv import load_dotenv
import time
import speech_recognition as sr
import pyttsx3
from piper.voice import PiperVoice
import numpy as np
import sounddevice as sd
load_dotenv()
#tried with both .json or just .onnx
#model = “/home/jedd/jarvis/en_US-lessac-medium.onnx.json”
model = “/home/jedd/jarvis/en_US-lessac-medium.onnx”
voice = PiperVoice(model)
text = “This is an example of text to speech”
wav_file = wave.open(“output.wav”, “w”)
audio = voice.synthesize(text, wav_file)
Raspberry Pi 5 (raspberry 64-bit os)
Python 3.11.2
Traceback (most recent call last):
File “/home/jedd/jarvis/jv.py”, line 19, in
voice = PiperVoice(model)
^^^^^^^^^^^^^^^^^
TypeError: PiperVoice.__init__() missing 1 required positional argument: ‘config’
There’s a omission in the PiperVoice(model) statement. It needs the .load method to work.
PiperVoice.load(model) is the correct.
Should be PiperVoice.load(model)
The syntax is wrong.
You should write:
“PiperVoice.load(model)”
I had the same problem. The right syntax is:
voice = PiperVoice.load(model)
Now it works.
Also got positional character error “config”. Resolved by replacing voice = PiperVoice(model) with voice = PiperVoice.load(model)
At least for the latest Piper, this line in your example:
voice = PiperVoice(model)
should be changed to:
voice = PiperVoice.load(model)
This will eliminate the error that AA was describing.
thanks for the tutorial the github repo doesn’t explain how to use the python piper API at all! works perfect on ubuntu 20.09 python 3.9
the code throws the error “Illegal instruction” when running the line from piper.voice import PiperVoice
i am using raspberry pi. i don’t seem to understand why this is happening! in the article above it says Piper should run fine on raspberry pi 4
I am getting an AttributeError: ‘PiperVoice’ object has no attribute ‘synthesize_stream_raw’. I have upgraded to the latest version 1.3.0 and this issue persists