Using OpenAI for Text-to-Speech
This guide covers how to use OpenAI's official Text-to-Speech API with Open WebUI. This is the simplest setup if you already have an OpenAI API key.
See the companion guide: Using OpenAI for Speech-to-Text
Requirements
- An OpenAI API key with access to the Audio API
- Open WebUI installed and running
Quick Setup (UI)
- Click your profile icon (bottom-left corner)
- Select Admin Panel
- Click Settings → Audio tab
- Configure the following:
| Setting | Value |
|---|---|
| Text-to-Speech Engine | OpenAI |
| API Base URL | https://api.openai.com/v1 |
| API Key | Your OpenAI API key |
| TTS Model | tts-1 or tts-1-hd |
| TTS Voice | Choose from available voices |
- Click Save
Available Models
| Model | Description | Best For |
|---|---|---|
tts-1 | Standard quality, lower latency | Real-time applications, faster responses |
tts-1-hd | Higher quality audio | Pre-recorded content, premium audio quality |
Available Voices
OpenAI provides 6 built-in voices:
| Voice | Description |
|---|---|
alloy | Neutral, balanced |
echo | Warm, conversational |
fable | Expressive, British accent |
onyx | Deep, authoritative |
nova | Friendly, upbeat |
shimmer | Soft, gentle |
Try different voices to find the one that best suits your use case. You can preview voices in OpenAI's documentation.
Per-Model TTS Voice
You can assign a specific TTS voice to individual models, allowing different AI personas to have distinct voices. This is configured in the Model Editor.
Setting a Model-Specific Voice
- Go to Workspace > Models
- Click the Edit (pencil) icon on the model you want to configure
- Scroll down to find the TTS Voice field
- Enter the voice name (e.g.,
alloy,echo,shimmer,onyx,nova,fable) - Click Save
Voice Priority
When playing TTS audio, Open WebUI uses the following priority:
- Model-specific TTS voice (if set in Model Editor)
- User's personal voice setting (if configured in user settings)
- System default voice (configured by admin)
This allows admins to give each AI persona a consistent voice while still letting users override with their personal preference when no model-specific voice is set.
Use Cases
- Character personas: Give a "British Butler" model the
fablevoice, while an "Energetic Assistant" usesnova - Language learning: Assign appropriate voices for different language tutors
- Accessibility: Set clearer voices for models designed for accessibility use cases
Environment Variables Setup
If you prefer to configure via environment variables:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_OPENAI_API_BASE_URL=https://api.openai.com/v1
- AUDIO_TTS_OPENAI_API_KEY=sk-...
- AUDIO_TTS_MODEL=tts-1
- AUDIO_TTS_VOICE=alloy
# ... other configuration
All TTS Environment Variables
| Variable | Description | Default |
|---|---|---|
AUDIO_TTS_ENGINE | Set to openai | empty |
AUDIO_TTS_OPENAI_API_BASE_URL | OpenAI API base URL | https://api.openai.com/v1 |
AUDIO_TTS_OPENAI_API_KEY | Your OpenAI API key | empty |
AUDIO_TTS_MODEL | TTS model (tts-1 or tts-1-hd) | tts-1 |
AUDIO_TTS_VOICE | Voice to use | alloy |
Testing TTS
- Start a new chat
- Send a message to any model
- Click the speaker icon on the AI response to hear it read aloud
Response Splitting
When reading long responses, Open WebUI can split text into chunks before sending them to the TTS engine. This is configured in Admin Panel > Settings > Audio under Response Splitting.
| Option | Description |
|---|---|
| Punctuation (default) | Splits at sentence boundaries: periods (.), exclamation marks (!), question marks (?), and newlines. Best for natural pacing. |
| Paragraphs | Splits only at paragraph breaks (double newlines). Results in longer audio chunks. |
| None | Sends the entire response as one chunk. May cause delays before audio starts on long responses. |
Punctuation mode is recommended for most use cases. It provides the best balance of streaming performance (audio starts quickly) and natural speech pacing.
Troubleshooting
No Audio Plays
- Check your OpenAI API key is valid and has Audio API access
- Verify the API Base URL is correct (
https://api.openai.com/v1) - Check browser console (F12) for errors
Audio Quality Issues
- Switch from
tts-1totts-1-hdfor higher quality - Note:
tts-1-hdhas slightly higher latency
Rate Limits
OpenAI has rate limits on the Audio API. If you're hitting limits:
- Consider caching common phrases
- Use
tts-1instead oftts-1-hd(uses fewer tokens)
For more troubleshooting, see the Audio Troubleshooting Guide.
Cost Considerations
OpenAI charges per character for TTS. See OpenAI Pricing for current rates. Note that tts-1-hd costs more than tts-1.
For a free alternative, consider OpenAI Edge TTS which uses Microsoft's free Edge browser TTS.