Kokoro-FastAPI Using Docker

warning

This tutorial is a community contribution and is not supported by the Open WebUI team. It serves only as a demonstration on how to customize Open WebUI for your specific use case. Want to contribute? Check out the contributing tutorial.

What is `Kokoro-FastAPI`?

Kokoro-FastAPI is a dockerized FastAPI wrapper for the Kokoro-82M text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds.

Key Features

OpenAI-compatible Speech endpoint with inline voice combination
NVIDIA GPU accelerated or CPU Onnx inference
Streaming support with variable chunking
Multiple audio format support (.mp3, .wav, .opus, .flac, .aac, .pcm)
Integrated web interface on localhost:8880/web (or additional container in repo for gradio)
Phoneme endpoints for conversion and generation

Voices

af
af_bella
af_irulan
af_nicole
af_sarah
af_sky
am_adam
am_michael
am_gurney
bf_emma
bf_isabella
bm_george
bm_lewis

Languages

en_us
en_uk

Requirements

Docker installed on your system
Open WebUI running
For GPU support: NVIDIA GPU with CUDA 12.3
For CPU-only: No special requirements

⚡️ Quick start

You can choose between GPU or CPU versions

GPU Version (Requires NVIDIA GPU with CUDA 12.8)

Using docker run:

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu

Or docker compose, by creating a docker-compose.yml file and running docker compose up. For example:

name: kokoro
services:
    kokoro-fastapi-gpu:
        ports:
            - 8880:8880
        image: ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.1
        restart: always
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: all
                          capabilities:
                              - gpu

info

You may need to install and configure the NVIDIA Container Toolkit

CPU Version (ONNX optimized inference)

With docker run:

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu

With docker compose:

name: kokoro
services:
    kokoro-fastapi-cpu:
        ports:
            - 8880:8880
        image: ghcr.io/remsky/kokoro-fastapi-cpu
        restart: always

Setting up Open WebUI to use `Kokoro-FastAPI`

To use Kokoro-FastAPI with Open WebUI, follow these steps:

Open the Admin Panel and go to Settings -> Audio
Set your TTS Settings to match the following:
- Text-to-Speech Engine: OpenAI
- API Base URL: http://localhost:8880/v1 # you may need to use host.docker.internal instead of localhost
- API Key: not-needed
- TTS Voice: af_bella # also accepts mapping of existing OAI voices for compatibility
- TTS Model: kokoro

info

The default API key is the string not-needed. You do not have to change that value if you do not need the added security.

Building the Docker Container

git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI
cd docker/cpu # or docker/gpu
docker compose up --build

That's it!

For more information on building the Docker container, including changing ports, please refer to the Kokoro-FastAPI repository

Troubleshooting

NVIDIA GPU Not Detected

If the GPU version isn't using your GPU:

Install NVIDIA Container Toolkit:

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU access:

docker run --rm --gpus all nvidia/cuda:12.2.0-base nvidia-smi

Connection Issues from Open WebUI

If Open WebUI can't reach Kokoro, this is usually a Docker networking issue. Choose the method that matches your setup:

Option 1 — Docker Desktop (Windows/Mac):

Use host.docker.internal instead of localhost:http://host.docker.internal:8880/v1

Option 2 — Docker Compose (same network):

Use the service name directly:http://kokoro-fastapi-gpu:8880/v1

Option 3 — Docker Network (recommended for Linux):

If host.docker.internal doesn't work, create a shared Docker network:

# Create a Docker network
docker network create local-llm

# Connect both containers to the network
docker network connect local-llm open-webui
docker network connect local-llm kokoro-fastapi

# Restart both containers
docker restart open-webui kokoro-fastapi

Then set your API Base URL to http://kokoro-fastapi:8880/v1

Verify the service is running: curl http://localhost:8880/health

This content is for informational purposes only and does not constitute a warranty, guarantee, or contractual commitment. Open WebUI is provided "as is." See your license for applicable terms.

What is Kokoro-FastAPI?​

Key Features​

Voices​

Languages​

Requirements​

⚡️ Quick start​

You can choose between GPU or CPU versions​

GPU Version (Requires NVIDIA GPU with CUDA 12.8)​

CPU Version (ONNX optimized inference)​

Setting up Open WebUI to use Kokoro-FastAPI​

Building the Docker Container​

Troubleshooting​

NVIDIA GPU Not Detected​

Connection Issues from Open WebUI​