Using Mistral Voxtral for Speech-to-Text

This guide covers how to use Mistral's Voxtral model for Speech-to-Text with Open WebUI. Voxtral is Mistral's speech-to-text model that provides accurate transcription.

Requirements

A Mistral API key
Open WebUI installed and running

Quick Setup (UI)

Click your profile icon (bottom-left corner)
Select Admin Panel
Click Settings → Audio tab
Configure the following:

Setting	Value
Speech-to-Text Engine	`MistralAI`
API Key	Your Mistral API key
STT Model	`voxtral-mini-latest` (or leave empty for default)

Click Save

Available Models

Model	Description
`voxtral-mini-latest`	Default transcription model (recommended)

Environment Variables Setup

If you prefer to configure via environment variables:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    environment:
      - AUDIO_STT_ENGINE=mistral
      - AUDIO_STT_MISTRAL_API_KEY=your-mistral-api-key
      - AUDIO_STT_MODEL=voxtral-mini-latest
    # ... other configuration

All Mistral STT Environment Variables

Variable	Description	Default
`AUDIO_STT_ENGINE`	Set to `mistral`	empty (uses local Whisper)
`AUDIO_STT_MISTRAL_API_KEY`	Your Mistral API key	empty
`AUDIO_STT_MISTRAL_API_BASE_URL`	Mistral API base URL	`https://api.mistral.ai/v1`
`AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS`	Use chat completions endpoint	`false`
`AUDIO_STT_MODEL`	STT model	`voxtral-mini-latest`

Transcription Methods

Mistral supports two transcription methods:

Standard Transcription (Default)

Uses the dedicated transcription endpoint. This is the recommended method.

Chat Completions Method

Set AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS=true to use Mistral's chat completions API for transcription. This method:

Requires audio in mp3 or wav format (automatic conversion is attempted)
May provide different results than the standard endpoint

Using STT

Click the microphone icon in the chat input
Speak your message
Click the microphone again or wait for silence detection
Your speech will be transcribed and appear in the input box

Supported Audio Formats

Voxtral accepts common audio formats. The system defaults to accepting audio/* and video/webm.

If using the chat completions method, audio is automatically converted to mp3.

Troubleshooting

API Key Errors

If you see "Mistral API key is required":

Verify your API key is entered correctly
Check the API key hasn't expired
Ensure your Mistral account has API access

Transcription Not Working

Check container logs: docker logs open-webui -f
Verify the STT Engine is set to MistralAI
Try the standard transcription method (disable chat completions)

Audio Format Issues

If using chat completions method and audio conversion fails:

Ensure FFmpeg is available in the container
Try recording in a different format (wav or mp3)
Switch to the standard transcription method

For more troubleshooting, see the Audio Troubleshooting Guide.

Comparison with Other STT Options

Feature	Mistral Voxtral	OpenAI Whisper	Local Whisper
Cost	Per-minute pricing	Per-minute pricing	Free
Privacy	Audio sent to Mistral	Audio sent to OpenAI	Audio stays local
Model Options	voxtral-mini-latest	whisper-1	tiny → large
GPU Required	No	No	Recommended

Cost Considerations

Mistral charges per minute of audio for STT. Check Mistral's pricing page for current rates.

tip

For free STT, use Local Whisper (the default) or the browser's Web API for basic transcription.

Requirements​

Quick Setup (UI)​

Available Models​

Environment Variables Setup​

All Mistral STT Environment Variables​

Transcription Methods​

Standard Transcription (Default)​

Chat Completions Method​

Using STT​

Supported Audio Formats​

Troubleshooting​

API Key Errors​

Transcription Not Working​

Audio Format Issues​

Comparison with Other STT Options​

Cost Considerations​