Using OpenAI for Speech-to-Text
This guide covers how to use OpenAI's Whisper API for Speech-to-Text with Open WebUI. This provides cloud-based transcription without needing local GPU resources.
See the companion guide: Using OpenAI for Text-to-Speech
Requirements
- An OpenAI API key with access to the Audio API
- Open WebUI installed and running
Quick Setup (UI)
- Click your profile icon (bottom-left corner)
- Select Admin Panel
- Click Settings → Audio tab
- Configure the following:
| Setting | Value |
|---|---|
| Speech-to-Text Engine | OpenAI |
| API Base URL | https://api.openai.com/v1 |
| API Key | Your OpenAI API key |
| STT Model | whisper-1 |
| Supported Content Types | Leave empty for defaults, or set audio/wav,audio/mpeg,audio/webm |
- Click Save
Available Models
| Model | Description |
|---|---|
whisper-1 | OpenAI's Whisper large-v2 model, hosted in the cloud |
OpenAI currently only offers whisper-1. For more model options, use Local Whisper (built into Open WebUI) or other providers like Deepgram.
Environment Variables Setup
If you prefer to configure via environment variables:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- AUDIO_STT_ENGINE=openai
- AUDIO_STT_OPENAI_API_BASE_URL=https://api.openai.com/v1
- AUDIO_STT_OPENAI_API_KEY=sk-...
- AUDIO_STT_MODEL=whisper-1
# ... other configuration
All STT Environment Variables (OpenAI)
| Variable | Description | Default |
|---|---|---|
AUDIO_STT_ENGINE | Set to openai | empty (uses local Whisper) |
AUDIO_STT_OPENAI_API_BASE_URL | OpenAI API base URL | https://api.openai.com/v1 |
AUDIO_STT_OPENAI_API_KEY | Your OpenAI API key | empty |
AUDIO_STT_MODEL | STT model | whisper-1 |
AUDIO_STT_SUPPORTED_CONTENT_TYPES | Allowed audio MIME types | audio/*,video/webm |
Supported Audio Formats
By default, Open WebUI accepts audio/* and video/webm for transcription. If you need to restrict or expand supported formats, set AUDIO_STT_SUPPORTED_CONTENT_TYPES:
environment:
- AUDIO_STT_SUPPORTED_CONTENT_TYPES=audio/wav,audio/mpeg,audio/webm
OpenAI's Whisper API supports: mp3, mp4, mpeg, mpga, m4a, wav, webm
Using STT
- Click the microphone icon in the chat input
- Speak your message
- Click the microphone again or wait for silence detection
- Your speech will be transcribed and appear in the input box
OpenAI vs Local Whisper
| Feature | OpenAI Whisper API | Local Whisper |
|---|---|---|
| Latency | Network dependent | Faster for short clips |
| Cost | Per-minute pricing | Free (uses your hardware) |
| Privacy | Audio sent to OpenAI | Audio stays local |
| GPU Required | No | Recommended for speed |
| Model Options | whisper-1 only | tiny, base, small, medium, large |
Choose OpenAI if:
- You don't have a GPU
- You want consistent performance
- Privacy isn't a concern
Choose Local Whisper if:
- You want free transcription
- You need audio to stay private
- You have a GPU for acceleration
Troubleshooting
Microphone Not Working
- Ensure you're using HTTPS or localhost
- Check browser microphone permissions
- See Microphone Access Issues
Transcription Errors
- Check your OpenAI API key is valid
- Verify the API Base URL is correct
- Check container logs for error messages
Language Issues
OpenAI's Whisper API automatically detects language. If you need to force a specific language, consider using Local Whisper with the WHISPER_LANGUAGE environment variable.
For more troubleshooting, see the Audio Troubleshooting Guide.
Cost Considerations
OpenAI charges per minute of audio for STT. See OpenAI Pricing for current rates.
For free STT, use Local Whisper (the default) or the browser's Web API for basic transcription.