Using OpenAI for Speech-to-Text

This guide covers how to use OpenAI's Whisper API for Speech-to-Text with Open WebUI. This provides cloud-based transcription without needing local GPU resources.

Looking for TTS?

See the companion guide: Using OpenAI for Text-to-Speech

Requirements

An OpenAI API key with access to the Audio API
Open WebUI installed and running

Quick Setup (UI)

Click your profile icon (bottom-left corner)
Select Admin Panel
Click Settings → Audio tab
Configure the following:

Setting	Value
Speech-to-Text Engine	`OpenAI`
API Base URL	`https://api.openai.com/v1`
API Key	Your OpenAI API key
STT Model	`whisper-1`
Supported Content Types	Leave empty for defaults, or set `audio/wav,audio/mpeg,audio/webm`

Click Save

Available Models

Model	Description
`whisper-1`	OpenAI's Whisper large-v2 model, hosted in the cloud

info

OpenAI currently only offers whisper-1. For more model options, use Local Whisper (built into Open WebUI) or other providers like Deepgram.

Environment Variables Setup

If you prefer to configure via environment variables:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    environment:
      - AUDIO_STT_ENGINE=openai
      - AUDIO_STT_OPENAI_API_BASE_URL=https://api.openai.com/v1
      - AUDIO_STT_OPENAI_API_KEY=sk-...
      - AUDIO_STT_MODEL=whisper-1
    # ... other configuration

All STT Environment Variables (OpenAI)

Variable	Description	Default
`AUDIO_STT_ENGINE`	Set to `openai`	empty (uses local Whisper)
`AUDIO_STT_OPENAI_API_BASE_URL`	OpenAI API base URL	`https://api.openai.com/v1`
`AUDIO_STT_OPENAI_API_KEY`	Your OpenAI API key	empty
`AUDIO_STT_MODEL`	STT model	`whisper-1`
`AUDIO_STT_SUPPORTED_CONTENT_TYPES`	Allowed audio MIME types	`audio/*,video/webm`

Supported Audio Formats

By default, Open WebUI accepts audio/* and video/webm for transcription. If you need to restrict or expand supported formats, set AUDIO_STT_SUPPORTED_CONTENT_TYPES:

environment:
  - AUDIO_STT_SUPPORTED_CONTENT_TYPES=audio/wav,audio/mpeg,audio/webm

OpenAI's Whisper API supports: mp3, mp4, mpeg, mpga, m4a, wav, webm

Using STT

Click the microphone icon in the chat input
Speak your message
Click the microphone again or wait for silence detection
Your speech will be transcribed and appear in the input box

OpenAI vs Local Whisper

Feature	OpenAI Whisper API	Local Whisper
Latency	Network dependent	Faster for short clips
Cost	Per-minute pricing	Free (uses your hardware)
Privacy	Audio sent to OpenAI	Audio stays local
GPU Required	No	Recommended for speed
Model Options	`whisper-1` only	tiny, base, small, medium, large

Choose OpenAI if:

You don't have a GPU
You want consistent performance
Privacy isn't a concern

Choose Local Whisper if:

You want free transcription
You need audio to stay private
You have a GPU for acceleration

Troubleshooting

Microphone Not Working

Ensure you're using HTTPS or localhost
Check browser microphone permissions
See Microphone Access Issues

Transcription Errors

Check your OpenAI API key is valid
Verify the API Base URL is correct
Check container logs for error messages

Language Issues

OpenAI's Whisper API automatically detects language. If you need to force a specific language, consider using Local Whisper with the WHISPER_LANGUAGE environment variable.

For more troubleshooting, see the Audio Troubleshooting Guide.

Cost Considerations

OpenAI charges per minute of audio for STT. See OpenAI Pricing for current rates.

tip

For free STT, use Local Whisper (the default) or the browser's Web API for basic transcription.

Requirements​

Quick Setup (UI)​

Available Models​

Environment Variables Setup​

All STT Environment Variables (OpenAI)​

Supported Audio Formats​

Using STT​

OpenAI vs Local Whisper​

Troubleshooting​

Microphone Not Working​

Transcription Errors​

Language Issues​

Cost Considerations​