Skip to main content

Starting with OpenAI-Compatible Servers

Overview​

Open WebUI isn't just for OpenAI/Ollama/Llama.cppβ€”you can connect any server that implements the OpenAI-compatible API, running locally or remotely. This is perfect if you want to run different language models, or if you already have a favorite backend or ecosystem. This guide will show you how to:

  • Set up an OpenAI-compatible server (with a few popular options)
  • Connect it to Open WebUI
  • Start chatting right away

Step 1: Choose an OpenAI-Compatible Server​

There are many servers and tools that expose an OpenAI-compatible API. Here are some of the most popular:

  • Llama.cpp: Extremely efficient, runs on CPU and GPU
  • Ollama: Super user-friendly and cross-platform
  • LM Studio: Rich desktop app for Windows/Mac/Linux
  • Lemonade: Fast ONNX-based backend with NPU/iGPU acceleration

Pick whichever suits your workflow!


πŸ‹ Get Started with Lemonade​

Lemonade is a plug-and-play ONNX-based OpenAI-compatible server. Here’s how to try it on Windows:

  1. Download the latest .exe

  2. Run Lemonade_Server_Installer.exe

  3. Install and download a model using Lemonade’s installer

  4. Once running, your API endpoint will be:

    http://localhost:8000/api/v0

Lemonade Server

See their docs for details.


Step 2: Connect Your Server to Open WebUI​

  1. Open Open WebUI in your browser.
  2. Go to βš™οΈ Admin Settings β†’ Connections β†’ OpenAI.
  3. Click βž• Add Connection.
  4. Select the Standard / Compatible tab (if available).
  5. Fill in the following:
    • API URL: Use your server’s API endpoint.
      • Examples: http://localhost:11434/v1 (Ollama), http://localhost:10000/v1 (Llama.cpp).
    • API Key: Leave blank unless the server requires one.
  6. Click Save.
tip

If running Open WebUI in Docker and your model server on your host machine, use http://host.docker.internal:<your-port>/v1.

For Lemonade: When adding Lemonade, use http://localhost:8000/api/v0 as the URL.​

Lemonade Connection


Required API Endpoints​

To ensure full compatibility with Open WebUI, your server should implement the following OpenAI-standard endpoints:

EndpointMethodRequired?Purpose
/v1/modelsGETYesUsed for model discovery and selecting models in the UI.
/v1/chat/completionsPOSTYesThe core endpoint for chat, supporting streaming and parameters like temperature.
/v1/embeddingsPOSTNoRequired if you want to use this provider for RAG (Retrieval Augmented Generation).
/v1/audio/speechPOSTNoRequired for Text-to-Speech (TTS) functionality.
/v1/audio/transcriptionsPOSTNoRequired for Speech-to-Text (STT/Whisper) functionality.
/v1/images/generationsPOSTNoRequired for Image Generation (DALL-E) functionality.

Supported Parameters​

Open WebUI passes standard OpenAI parameters such as temperature, top_p, max_tokens (or max_completion_tokens), stop, seed, and logit_bias. It also supports Tool Use (Function Calling) if your model and server support the tools and tool_choice parameters.


Step 3: Start Chatting!​

Select your connected server’s model in the chat menu and get started!

That’s it! Whether you choose Llama.cpp, Ollama, LM Studio, or Lemonade, you can easily experiment and manage multiple model serversβ€”all in Open WebUI.


πŸš€ Enjoy building your perfect local AI setup!