Audio Troubleshooting Guide
This page covers common issues with Speech-to-Text (STT) and Text-to-Speech (TTS) functionality in Open WebUI, along with their solutions.
Where to Find Audio Settings
Admin Settings (Server-Wide)
Admins can configure server-wide audio defaults:
- Click your profile icon (bottom-left corner)
- Select Admin Panel
- Click Settings in the top navigation
- Select the Audio tab
Here you can configure:
- Speech-to-Text Engine — Choose between local Whisper, OpenAI, Azure, Deepgram, or Mistral
- Whisper Model — Select model size for local STT (tiny, base, small, medium, large)
- Text-to-Speech Engine — Choose between OpenAI-compatible, ElevenLabs, Azure, local Transformers, or disable backend TTS (browser-only)
- TTS Voice — Select the default voice
- API Keys and Base URLs — Configure external service connections
User Settings (Per-User)
Individual users can customize their audio experience:
- Click your profile icon (bottom-left corner)
- Select Settings
- Click the Audio tab
User-level options include:
- STT Engine Override — Use "Web API" for browser-based speech recognition
- STT Language — Set preferred language for transcription
- TTS Engine — Choose "Browser Kokoro" for local in-browser TTS
- TTS Voice — Select from available voices
- Auto-playback — Automatically play AI responses
- Playback Speed — Adjust audio speed
- Conversation Mode — Enable hands-free voice interaction
User settings override admin defaults. If you're having issues, check both locations to ensure settings aren't conflicting.
Quick Setup Guide
Fastest Setup: OpenAI (Paid)
If you have an OpenAI API key, this is the simplest setup:
In Admin Panel → Settings → Audio:
- STT Engine:
OpenAI| Model:whisper-1 - TTS Engine:
OpenAI| Model:tts-1| Voice:alloy - Enter your OpenAI API key in both sections
Or via environment variables:
environment:
- AUDIO_STT_ENGINE=openai
- AUDIO_STT_OPENAI_API_KEY=sk-...
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_OPENAI_API_KEY=sk-...
- AUDIO_TTS_MODEL=tts-1
- AUDIO_TTS_VOICE=alloy
→ See full guides: Speech-to-Text | Text-to-Speech
Free Setup: Local Whisper + Edge TTS
For a completely free setup:
STT: Leave engine empty (uses built-in Whisper running on the backend)
environment:
- WHISPER_MODEL=base # Options: tiny, base, small, medium, large
TTS: Use OpenAI Edge TTS (free Microsoft voices)
services:
openai-edge-tts:
image: travisvn/openai-edge-tts:latest
ports:
- "5050:5050"
open-webui:
environment:
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_OPENAI_API_BASE_URL=http://openai-edge-tts:5050/v1
- AUDIO_TTS_OPENAI_API_KEY=not-needed
→ See full guide: OpenAI Edge TTS
Browser-Only Setup (No Backend Config Needed)
For basic functionality without any server-side audio processing:
In User Settings → Audio:
- STT Engine:
Web API(uses the browser's built-in speech recognition; does not call the backend STT endpoint) - TTS Engine:
Web APIorBrowser Kokoro(uses browser's built-in text-to-speech or client-side Kokoro; does not call the backend TTS endpoint)
When the admin leaves AUDIO_TTS_ENGINE as an empty string (the default), no backend TTS service is available. All TTS is handled client-side. Similarly, if users select "Web API" for STT in their user settings, the backend's local Whisper is not used.
Microphone Access Issues
Understanding Secure Contexts 🔒
For security reasons, accessing the microphone is restricted to pages served over HTTPS or locally from localhost. This requirement is meant to safeguard your data by ensuring it is transmitted over secure channels.
Common Permission Issues 🚫
Browsers like Chrome, Brave, Microsoft Edge, Opera, and Vivaldi, as well as Firefox, restrict microphone access on non-HTTPS URLs. This typically becomes an issue when accessing a site from another device within the same network (e.g., using a mobile phone to access a desktop server).
Solutions for Non-HTTPS Connections
-
Set Up HTTPS (Recommended):
- Configure your server to support HTTPS. This not only resolves permission issues but also enhances the security of your data transmissions.
- You can use a reverse proxy like Nginx or Caddy with Let's Encrypt certificates.
-
Temporary Browser Flags (Use with caution):
- These settings force your browser to treat certain insecure URLs as secure. This is useful for development purposes but poses significant security risks.
Chromium-based Browsers (e.g., Chrome, Brave):
- Open
chrome://flags/#unsafely-treat-insecure-origin-as-secure - Enter your non-HTTPS address (e.g.,
http://192.168.1.35:3000) - Restart the browser to apply the changes
Firefox-based Browsers:
- Open
about:config - Search and modify (or create) the string value
dom.securecontext.allowlist - Add your IP addresses separated by commas (e.g.,
http://127.0.0.1:8080)
While browser flags offer a quick fix, they bypass important security checks which can expose your device and data to vulnerabilities. Always prioritize proper security measures, especially when planning for a production environment.
Microphone Not Working
If the microphone icon doesn't respond even on HTTPS:
- Check browser permissions: Ensure your browser has microphone access for the site
- Check system permissions: On Windows/Mac, ensure the browser has microphone access in system settings
- Check browser compatibility: Some browsers have limited STT support
- Try a different browser: Chrome typically has the best support for web audio APIs
Text-to-Speech (TTS) Issues
TTS Loading Forever / Not Working
If clicking the play button on chat responses causes endless loading, try the following solutions:
1. Hugging Face Dataset Library Conflict (Local Transformers TTS)
Symptoms:
- TTS keeps loading forever
- Container logs show:
RuntimeError: Dataset scripts are no longer supported, but found cmu-arctic-xvectors.py
Cause: This occurs when using local Transformers TTS (AUDIO_TTS_ENGINE=transformers). The datasets library is pulled in as an indirect dependency of the transformers package and isn't pinned to a specific version in Open WebUI's requirements. Newer versions of datasets removed support for dataset loading scripts, causing this error when loading speaker embeddings.
Solutions:
Temporary fix (re-applies after container restart):
docker exec open-webui bash -lc "pip install datasets==3.6.0" && docker restart open-webui
Permanent fix using environment variable:
Add this to your docker-compose.yml:
environment:
- EXTRA_PIP_PACKAGES=datasets==3.6.0
Verify the installed version:
docker exec open-webui bash -lc "pip show datasets"
Consider using an external TTS service like OpenAI Edge TTS or Kokoro instead of local Transformers TTS to avoid these dependency conflicts.
2. Using External TTS Instead of Local
If you continue to have issues with local TTS, configuring an external TTS service is often more reliable. See the example Docker Compose configuration below that uses openai-edge-tts:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_OPENAI_API_KEY=your-api-key-here
- AUDIO_TTS_OPENAI_API_BASE_URL=http://openai-edge-tts:5050/v1
depends_on:
- openai-edge-tts
# ... other configuration
openai-edge-tts:
image: travisvn/openai-edge-tts:latest
ports:
- "5050:5050"
environment:
- API_KEY=your-api-key-here
restart: unless-stopped
TTS Voice Not Found / No Audio Output
Checklist:
- Verify the TTS engine is correctly configured in Admin Panel → Settings → Audio
- Check that the voice name matches an available voice for your chosen engine
- For external TTS services, verify the API Base URL is accessible from the Open WebUI container
- Check container logs for any error messages
Docker Networking Issues with TTS
If Open WebUI can't reach your TTS service:
Problem: Using localhost in the API Base URL doesn't work from within Docker.
Solutions:
- Use
host.docker.internalinstead oflocalhost(works on Docker Desktop for Windows/Mac) - Use the container name if both services are on the same Docker network (e.g.,
http://openai-edge-tts:5050/v1) - Use the host machine's IP address
Speech-to-Text (STT) Issues
Whisper STT Not Working / Compute Type Error
Symptoms:
- Error message:
Error transcribing chunk: Requested int8 compute type, but the target device or backend do not support efficient int8 computation - STT fails to process audio, often showing a persistent loading spinner or a red error toast.
Cause: This typically occurs when using the :cuda Docker image with an NVIDIA GPU that doesn't support the required int8 compute operations (common on older Maxwell or Pascal architecture GPUs). In version v0.6.43, a regression caused the compute type to be incorrectly defaulted or hardcoded to int8 in some scenarios.
Solutions:
1. Upgrade to the Latest Version (Recommended)
The most reliable fix is to upgrade to the latest version of Open WebUI. Recent updates ensure that WHISPER_COMPUTE_TYPE is correctly respected and provides optimized defaults for CUDA environments.
2. Manually Set Compute Type
If you are on an affected version or still experiencing issues on GPU, explicitly set the compute type to float16:
environment:
- WHISPER_COMPUTE_TYPE=float16
3. Switch to the Standard Image
If your GPU is very old or compatibility persists, switch to the standard (CPU-based) image. For smaller models like Whisper, CPU mode often provides comparable performance without compatibility issues:
# Instead of:
# ghcr.io/open-webui/open-webui:cuda
# Use:
ghcr.io/open-webui/open-webui:main
The CUDA image primarily accelerates RAG embedding/reranking models and Whisper STT. For smaller models like Whisper, CPU mode often provides comparable performance without the compatibility issues.
Adjust Whisper Compute Type
If you want to keep GPU acceleration, try changing the compute type:
environment:
- WHISPER_COMPUTE_TYPE=float16 # Recommended for GPU
Available compute types (from faster-whisper):
| Compute Type | Best For | Notes |
|---|---|---|
int8 | CPU (default) | Fastest, but doesn't work on older GPUs |
float16 | CUDA/GPU (recommended) | Best balance of speed and compatibility for GPUs |
int8_float16 | GPU with hybrid precision | Uses int8 for weights, float16 for computation |
float32 | Maximum compatibility | Slowest, but works on all hardware |
- CPU mode: Defaults to
int8for best performance - CUDA mode: The
:cudaimage may default toint8, which can cause errors on older GPUs. Setfloat16explicitly for GPUs.
STT Not Recognizing Speech Correctly
Tips for better recognition:
-
Set the correct language:
environment:
- WHISPER_LANGUAGE=en # Use ISO 639-1 language code -
Try a larger Whisper model for better accuracy (at the cost of speed):
environment:
- WHISPER_MODEL=medium # Options: tiny, base, small, medium, large -
Check microphone permissions in your browser (see above)
-
Use the Web API engine as an alternative:
- Go to user settings (not admin panel)
- Under STT Settings, try switching Speech-to-Text Engine to "Web API"
- This uses the browser's built-in speech recognition
ElevenLabs Integration
ElevenLabs is natively supported in Open WebUI. To configure:
- Go to Admin Panel → Settings → Audio
- Select ElevenLabs as the TTS engine
- Enter your ElevenLabs API key
- Select the voice and model
- Save settings
Using environment variables:
environment:
- AUDIO_TTS_ENGINE=elevenlabs
- AUDIO_TTS_API_KEY=sk_... # Your ElevenLabs API key
- AUDIO_TTS_VOICE=EXAVITQu4vr4xnSDxMaL # Voice ID from ElevenLabs dashboard
- AUDIO_TTS_MODEL=eleven_multilingual_v2
You can find your Voice ID in the ElevenLabs dashboard under the voice settings. Common model options are eleven_multilingual_v2 or eleven_monolingual_v1.
General Debugging Tips
Check Container Logs
# View Open WebUI logs
docker logs open-webui -f
# View logs for external TTS service (if applicable)
docker logs openai-edge-tts -f
Check Browser Console
- Open browser developer tools (F12 or right-click → Inspect)
- Go to the Console tab
- Look for error messages when attempting to use audio features
Verify Service Health
For external TTS services, test directly:
# Test OpenAI Edge TTS
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{"input": "Hello, this is a test.", "voice": "alloy"}' \
--output test.mp3
Network Connectivity
Verify the Open WebUI container can reach external services:
# Enter the container
docker exec -it open-webui bash
# Test connectivity (if curl is available)
curl http://your-tts-service:port/health
Quick Reference: Environment Variables
TTS Environment Variables
| Variable | Description |
|---|---|
AUDIO_TTS_ENGINE | TTS engine: "" (empty, disables backend TTS - uses browser), openai, elevenlabs, azure, transformers |
AUDIO_TTS_MODEL | TTS model to use (default: tts-1) |
AUDIO_TTS_VOICE | Default voice for TTS (default: alloy) |
AUDIO_TTS_API_KEY | API key for ElevenLabs or Azure TTS |
AUDIO_TTS_OPENAI_API_BASE_URL | Base URL for OpenAI-compatible TTS |
AUDIO_TTS_OPENAI_API_KEY | API key for OpenAI-compatible TTS |
STT Environment Variables
| Variable | Description |
|---|---|
WHISPER_MODEL | Whisper model: tiny, base, small, medium, large (default: base) |
WHISPER_COMPUTE_TYPE | Compute type: int8, float16, int8_float16, float32 (default: int8) |
WHISPER_LANGUAGE | ISO 639-1 language code (empty = auto-detect) |
WHISPER_VAD_FILTER | Enable Voice Activity Detection filter (default: False) |
AUDIO_STT_ENGINE | STT engine: "" (empty, uses local Whisper), openai, azure, deepgram, mistral |
AUDIO_STT_OPENAI_API_BASE_URL | Base URL for OpenAI-compatible STT |
AUDIO_STT_OPENAI_API_KEY | API key for OpenAI-compatible STT |
DEEPGRAM_API_KEY | Deepgram API key |
For a complete list of audio environment variables, see Environment Variable Configuration.
Still Having Issues?
If you've tried the above solutions and still experience problems:
- Search existing issues on GitHub for similar problems
- Check the discussions for community solutions
- Create a new issue with:
- Open WebUI version
- Docker image being used
- Complete error logs
- Very detailed steps to reproduce
- Your environment details (OS, GPU if applicable)

