Skip to main content

Kokoro-FastAPI Using Docker

warning

This tutorial is a community contribution and is not supported by the Open WebUI team. It serves only as a demonstration on how to customize Open WebUI for your specific use case. Want to contribute? Check out the contributing tutorial.

What is Kokoro-FastAPI?

Kokoro-FastAPI is a dockerized FastAPI wrapper for the Kokoro-82M text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds.

Key Features

  • OpenAI-compatible Speech endpoint with inline voice combination
  • NVIDIA GPU accelerated or CPU Onnx inference
  • Streaming support with variable chunking
  • Multiple audio format support (.mp3, .wav, .opus, .flac, .aac, .pcm)
  • Integrated web interface on localhost:8880/web (or additional container in repo for gradio)
  • Phoneme endpoints for conversion and generation

Voices

  • af
  • af_bella
  • af_irulan
  • af_nicole
  • af_sarah
  • af_sky
  • am_adam
  • am_michael
  • am_gurney
  • bf_emma
  • bf_isabella
  • bm_george
  • bm_lewis

Languages

  • en_us
  • en_uk

Requirements

  • Docker installed on your system
  • Open WebUI running
  • For GPU support: NVIDIA GPU with CUDA 12.3
  • For CPU-only: No special requirements

⚡️ Quick start

You can choose between GPU or CPU versions

GPU Version (Requires NVIDIA GPU with CUDA 12.8)

Using docker run:

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu

Or docker compose, by creating a docker-compose.yml file and running docker compose up. For example:

name: kokoro
services:
kokoro-fastapi-gpu:
ports:
- 8880:8880
image: ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.1
restart: always
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
info

You may need to install and configure the NVIDIA Container Toolkit

CPU Version (ONNX optimized inference)

With docker run:

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu

With docker compose:

name: kokoro
services:
kokoro-fastapi-cpu:
ports:
- 8880:8880
image: ghcr.io/remsky/kokoro-fastapi-cpu
restart: always

Setting up Open WebUI to use Kokoro-FastAPI

To use Kokoro-FastAPI with Open WebUI, follow these steps:

  • Open the Admin Panel and go to Settings -> Audio
  • Set your TTS Settings to match the following:
    • Text-to-Speech Engine: OpenAI
    • API Base URL: http://localhost:8880/v1 # you may need to use host.docker.internal instead of localhost
    • API Key: not-needed
    • TTS Voice: af_bella # also accepts mapping of existing OAI voices for compatibility
    • TTS Model: kokoro
info

The default API key is the string not-needed. You do not have to change that value if you do not need the added security.

Building the Docker Container

git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI
cd docker/cpu # or docker/gpu
docker compose up --build

That's it!

For more information on building the Docker container, including changing ports, please refer to the Kokoro-FastAPI repository

Troubleshooting

NVIDIA GPU Not Detected

If the GPU version isn't using your GPU:

  1. Install NVIDIA Container Toolkit:

    # Ubuntu/Debian
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker
  2. Verify GPU access:

    docker run --rm --gpus all nvidia/cuda:12.2.0-base nvidia-smi

Connection Issues from Open WebUI

If Open WebUI can't reach Kokoro:

  • Use host.docker.internal:8880 instead of localhost:8880 (Docker Desktop)
  • If both are in Docker Compose, use http://kokoro-fastapi-gpu:8880/v1
  • Verify the service is running: curl http://localhost:8880/health

CPU Version Performance

The CPU version uses ONNX optimization and performs well for most use cases. If speed is a concern:

  • Consider upgrading to the GPU version
  • Ensure no other heavy processes are running on the CPU
  • The CPU version is recommended for systems without compatible NVIDIA GPUs

For more troubleshooting tips, see the Audio Troubleshooting Guide.