Essentials for Open WebUI

You have installed Open WebUI, connected a provider, and had your first conversation. This page covers the six things that turn a basic chat UI into a setup that works well day-to-day. None of them are required, but most users end up reaching for all of them within the first week.

Work through in order, or jump to the section you need:

Tool calling
Plugins
Task models
Context management
Basic RAG
Open Terminal
Troubleshooting

Deploying for a team?

If you are setting up Open WebUI for multiple users, also read the Scaling Open WebUI guide. It covers infrastructure decisions (PostgreSQL, Redis, external vector databases, shared storage) that are separate from the feature-level essentials on this page. The two guides are additive — work through the essentials here for day-to-day usage, and the scaling guide for multi-user infrastructure.

Tool calling

Without tools, an LLM can only generate text. With tools, it can look things up, run code, and take actions on your behalf. You attach a Tool to a chat (or to a model), and the model decides during its response when to call it and with what arguments. Open WebUI executes the call, feeds the result back, and the model continues with that new information.

Why this is first

Tool calling is what turns a chat model into an agent. Getting this set up unlocks most of the advanced features covered in the rest of this page.

There are two things worth understanding early on.

Understanding tool-calling modes

Open WebUI has two ways of connecting a model to its tools, and a common source of confusion is that the one labeled "Default" is not the one most people should be using. The naming makes more sense once you understand the history and Open WebUI's design philosophy.

Native Mode is recommended for most users

Default Mode is the legacy approach and exists for broad compatibility. If your model supports function calling (most modern models do), Native Mode is the better choice. See below for how to switch.

A core goal of Open WebUI is to support the widest possible range of models, from cutting-edge frontier APIs to tiny quantized models running on a Raspberry Pi. When Open WebUI first introduced tool calling, most models did not have built-in function-calling APIs. The only way to give a model tools was to describe them in the system prompt as plain text and then parse the model's response to figure out which tool it was trying to call. This prompt-based approach is what Open WebUI calls Default Mode. It was the original (and at the time, only) implementation, and it is now considered legacy handling. It remains the system default because it is the lowest common denominator: any model that can follow instructions in a system prompt can use it, no special API support required.

Since then, most model providers have added native function-calling support. Open WebUI calls this Native Mode. The model receives tool definitions as a structured part of its API request and returns structured tool calls in its response. It is faster, more reliable, preserves KV cache, and is required for all of Open WebUI's built-in system tools (Memory, Notes, Knowledge, Web Search, Image Gen, Code Interpreter). All new tool-calling features are built for Native Mode. But it only works with models that actually expose a function-calling API, which is why it cannot be the universal default.

In practice, most users should enable Native Mode. Every major current model supports it (OpenAI, Anthropic, Gemini, Llama 3.1+, Qwen 2.5+, DeepSeek, GLM, and others). You can set it once for your entire instance:

Set Native Mode globally in one step

Admin Panel > Settings > Models → click the Settings button at the top right of the models list → set Function Calling = Native → save. Every current and future model inherits the setting.

Per model: Admin Panel > Settings > Models > [your model] > Model Parameters > Function Calling = Native
Per chat: In the chat's Chat Controls (right sidebar)

If you are running an older local model or a fine-tune that does not expose a function-calling API, keep that specific model on Default Mode. For everything else, Native Mode is the better choice.

Choosing tools to enable

Many of the tools people look for are already built into Open WebUI and just need to be turned on: web search, code execution, image generation, memory, and knowledge-base retrieval are all available without installing any plugins. Once enabled, these appear automatically as system tools when using Native Mode.

Most of these need a small amount of setup (choosing a provider, adding an API key, or enabling a toggle). Setup guides for the most popular ones:

Web Search — connect a search provider (Google, Brave, DuckDuckGo, SearXNG and many more) so the model can look things up
Image Generation — connect an image provider (OpenAI DALL-E, ComfyUI, Automatic1111, etc.) for in-chat image creation
Code Execution — run code blocks directly in chat (Pyodide runs in-browser by default, or connect Jupyter for server-side execution)
Memory — let the model remember facts about you across conversations

For anything not built in, the Open WebUI Community site is worth browsing. A few categories to give a sense of what is available:

Observability / cost tracking: Langfuse, OpenLit, Portkey. Log every chat turn, token usage, and latency to your own stack.
Smart-home / automation: Home Assistant tools that let the model control devices, routines, and scenes.
Research: arXiv, PubMed, Semantic Scholar, Wolfram Alpha. Structured results with real citations.
Issue tracking / messaging: Jira, Linear, GitHub Issues, Slack, Discord, email.
Databases / APIs: read-only SQL against your own database, or calls to your internal API.
Domain-specific: weather, stocks, crypto, shipping tracking, recipes, and many more.

Tools appear in the + menu in the chat input. The model only sees the tools you have enabled for that conversation.

The community site is worth a look

The Open WebUI Community hosts thousands of community-built plugins. Before writing anything yourself, browse what is already there: sort by popularity, filter by category, and skim a few pages. Most of the time, someone has already built what you need (or something close enough to fork).

More detail:

Plugins

Open WebUI ships with a lot out of the box, but its real power is that it is designed to be extended. Many of the advanced capabilities people show in demos (auto-translation, token/cost tracking, custom post-processing, niche provider integrations) are plugins built on top of the platform. Understanding the plugin landscape is the single biggest unlock for a new user.

There are two plugin families: Tools and Functions.

Tools give the model abilities it can call during a response:

Source	What it does	Examples
Built-in	System tools that ship with Open WebUI. Enable in the admin panel, no install needed.	Web Search, Code Interpreter, Image Generation, Memory, Notes, Knowledge retrieval
Custom
↳ Tool	Code you write yourself or install from the community site. Manage in Workspace > Tools.	Langfuse / OpenLit observability, Home Assistant, arXiv / PubMed lookups, Wolfram Alpha, Jira / Linear, SQL queries
↳ Tool server	External services connected via MCP or OpenAPI. Configure in Admin Panel > Settings > Tools.	Your own microservices, third-party APIs, existing MCP servers

Functions run at the platform level and modify how Open WebUI itself behaves. There are three types:

Type	What it does	Examples
Pipes	Add a new "model" to the model picker, backed by custom code	Model-routing (cheap vs. expensive based on prompt), multi-step agent loops, custom LLM backends
Filters	Modify every request and/or response as it passes through, automatically on every chat turn	Context trimming, PII scrubbing, token / cost counting, Langfuse tracing, response reformatting
Actions	Add a button under each message that runs custom code when the user clicks it	"Regenerate follow-ups", "Translate reply", "Pin message", "Save to Knowledge"

Both Tools and Functions can be browsed and installed from the Open WebUI Community site, which hosts thousands of community-built plugins. You can also write your own from scratch in the admin panel.

Installing plugins

The Open WebUI Community site hosts the one-click catalog for both Tools and Functions. Pick one, click "Get", paste it into the admin panel, enable it, and configure its valves (the plugin's settings).

Before you build, browse

Whenever you think "it would be nice if Open WebUI did X," it almost certainly already does via a plugin. There are thousands of plugins already written, and the one you need is usually already there. Even if nothing matches exactly, the closest hit is usually only about 20 lines off from what you want and you can fork it from the admin panel.

Reference reading:

Task models

Good to know

By default, background tasks (titles, tags, autocomplete) use your main chat model. Setting a dedicated task model is the easiest way to improve speed and reduce unnecessary API costs.

Every time Open WebUI needs a short piece of "thinking" for a UI feature (writing a chat title for the sidebar, generating tags, suggesting follow-up questions, powering the autocomplete in the prompt box) it calls a Task Model. By default that task model is whatever main model you are currently chatting with, which means:

Your expensive flagship model gets invoked every time you open a new chat just to write "Groceries list."
On a slow local model, every keystroke feels laggy because autocomplete is waiting on a 30B-parameter model.
A reasoning model (o1, r1, Claude with extended thinking) spends five seconds thinking before producing a three-word title.

These run in the background, so they are easy to overlook. A dedicated task model is a small change that makes a noticeable difference.

Fix: In Admin Panel > Settings > Interface, set a dedicated Task Model. There are two fields, because the right choice depends on what your main chat model is:

Recommended task models

Task Model (External): Set to a fast, cheap, non-reasoning cloud model like gpt-5-nano, gemini-2.5-flash-lite, or llama-3.1-8b-instant.
Task Model (Local): Set to a tiny local model like qwen3:1b, gemma3:1b, or llama3.2:3b.

The main chat experience does not change. The background chores just stop dragging.

While you are in the Interface settings, you can also disable these chores entirely if you are on a low-spec machine or simply do not want them. Each one has both an admin toggle in the same page and an environment variable:

Chore	Admin toggle (Settings > Interface)	Env var
Autocomplete (fires on every keystroke)	Autocomplete Generation	`ENABLE_AUTOCOMPLETE_GENERATION=False`
Follow-up suggestions	Follow-up Generation	`ENABLE_FOLLOW_UP_GENERATION=False`
Chat title generation	Title Generation	`ENABLE_TITLE_GENERATION=False`
Tag generation	Tags Generation	`ENABLE_TAGS_GENERATION=False`

Performance on weak hardware

Autocomplete is the single biggest "make it snappy" toggle on weak hardware. It fires on every keystroke, so a slow task model turns the whole prompt box into molasses. Disable it first if the UI feels sluggish.

More detail: Performance & RAM: Dedicated Task Models.

Context management

After enough back-and-forth you will eventually see:

The prompt is too long: 207601, model maximum context length: 202751

This is not an Open WebUI bug

This error comes from your model provider, not from Open WebUI. Every time you send a message, the entire conversation (system prompt, all previous turns, attached files, tool call results, and your new message) is sent as the "prompt." When the sum exceeds the model's context window, the provider rejects the request.

Open WebUI intentionally does not ship a built-in trimmer, because:

Every model uses a different tokenizer (GPT, Claude, Gemini, GLM, Llama all differ).
Every model has a different context window (8k to 1M+).
Every deployment wants a different policy (trim by tokens, by turns, by message count, drop attachments first, summarize older messages, etc.).

There is no single correct answer. The supported approach is to install a filter Function that trims the conversation on your terms.

Quick fix

Community filters for most common policies already exist and can be installed with one click. If none fits, the code is short enough to copy and adapt. See the full guide including a minimal "newest N turns" filter: Troubleshooting: Context Window / Prompt Too Long.

Basic RAG

RAG (Retrieval-Augmented Generation) is the feature that lets you say "Here's a 400-page PDF, answer my questions about it" without the model having to read the whole thing every turn. Open WebUI splits your documents into chunks, embeds them as vectors, stores them in a vector database, and at chat time retrieves just the relevant pieces to pass to the model.

Two ways to use it, in order of simplicity:

One-off attachments. Drag a file into any chat input and ask questions. The file is chunked and embedded just for that chat.
Knowledge bases. For documents you want to reuse across many chats (company handbook, codebase, research library, user manual), go to Workspace > Knowledge and create a knowledge base. You can then attach the entire knowledge base to a chat (via the # shortcut in the input), or bind it to a model in Workspace > Models so that model always has it available.

The defaults are reasonable for getting started. When you outgrow them, there are three knobs that matter most:

Embedding engine. The default (SentenceTransformers all-MiniLM-L6-v2) runs locally on CPU and consumes roughly 500 MB of RAM per worker. For any multi-user deployment, point at an external embeddings API (OpenAI, or Ollama with nomic-embed-text) via RAG_EMBEDDING_ENGINE.
Content extraction engine. The default uses pypdf, which leaks memory during heavy ingestion. For anything beyond casual use, switch to Tika or Docling via CONTENT_EXTRACTION_ENGINE.
Vector database. The default ChromaDB (local SQLite-backed) does not tolerate multi-worker deployments. At scale, switch to PGVector — it is the only vector database officially supported and maintained by the Open WebUI team. Milvus, Qdrant, and MariaDB Vector are also available but are community-maintained: they may break on upgrades and fixes depend on community contributions. See the env-configuration reference for setup and the community disclaimers on each provider.

When to worry about this

None of these matter for "a single user with a handful of PDFs." All of them start mattering the moment you have 100 documents or 10 concurrent users.

Recommended starting config

If you just want RAG to work well out of the box, these settings are a solid general-purpose starting point. They are not fine-tuned for every use case, but they will produce noticeably better results than the defaults for most document types.

Set these in Admin Panel > Settings > Documents:

Setting	Recommended value	Default	Why
Text Splitter	`token`	`character`	Token-based splitting produces more consistent chunk sizes across document types
Markdown Header Splitting	On	On	Respects document structure by splitting at headings, keeping sections coherent
Chunk Size	`2000`	`1000`	Larger chunks preserve more surrounding context per retrieval hit
Chunk Overlap	`200`	`100`	More overlap means less chance of cutting a key sentence in half
Top K	`15`	`3`	Retrieves more candidate chunks, giving the model a wider pool of relevant context. If you are working with local models that have constrained context sizes, lower this to `5` to avoid filling the context window with retrieved chunks
Embedding Model	External (OpenAI or Ollama)	`all-MiniLM-L6-v2` (local CPU)	The default works for a single user but consumes ~500 MB RAM per worker. For any multi-user setup, use an external embedding API instead

Embedding model

The default SentenceTransformers model runs locally on CPU and is fine for a single user getting started. For anything beyond that, point at an external embeddings API: set RAG_EMBEDDING_ENGINE=openai with an OpenAI API key, or RAG_EMBEDDING_ENGINE=ollama with any Ollama embedding model (e.g., nomic-embed-text). This offloads the work and frees significant RAM.

More detail:

RAG overview
Knowledge workspace
Performance tuning for RAG
Scaling: external vector database — required for multi-worker and multi-replica deployments
Scaling: content extraction & embeddings — fixing memory leaks at scale

Open Terminal

If "run Python" is too restrictive and you want the model to actually work on your machine (clone repos, install packages, run test suites, spin up a local preview of a website, iterate on a data report against a real CSV), that is what Open Terminal is for. It connects a real shell (sandboxed in a Docker container by default, or bare-metal if you want) as a tool the model can call the same way it calls any other tool. In-chat file browser, live web previews, and skill definitions are included.

Try this

This is the biggest "aha" feature once you get past basic chat. It turns Open WebUI from a chat UI into a place where the model actually builds things for you. If Native Mode is on and you have given the model a capable terminal, ask it to build you a small app or run an analysis on a folder of files and watch it go.

More detail:

What to do next

You do not need all of the above on day one. A reasonable order for a new install:

Day one: Pick a good default model, have a few conversations, get a feel for the UI.
First thing after that: Set a Task Model and decide which background chores you actually want enabled. This is the single biggest "feels better" change you can make, and it directly addresses hidden per-chat costs.
Within the first week: Turn on Native Mode globally and install one or two Tools that match your work.
When you hit it: Install a context filter the first time you see "prompt is too long."
When you need it: Set up a Knowledge base the first time you want to ask questions across multiple documents.
When you are ready to go big: Point the model at Open Terminal and let it actually build things for you.
When you scale up: Revisit the RAG infrastructure section if you go beyond a single user.

Everything else (enterprise SSO, multi-replica HA, Redis scaling, observability) is in Advanced Topics and Troubleshooting when and if you need it.

Troubleshooting

When something goes wrong, start here:

Having problems with...	Read this
Connection refused, 401 errors, CORS failures, WebSocket disconnects	Connection Errors
"Prompt is too long" or context window exceeded	Context Window / Prompt Too Long
RAG not returning relevant results, uploads failing, knowledge base issues	RAG Troubleshooting
Web search not working or returning poor results	Web Search Troubleshooting
Image generation errors or provider setup	Image Generation Troubleshooting
Speech-to-text, text-to-speech, or audio playback	Audio Troubleshooting
SSO, OAuth, or LDAP login issues	SSO & OAuth Troubleshooting
High memory usage, slow responses, or worker crashes	Performance & RAM · Scaling Guide
Login loops, config drift, or database locks in multi-replica setups	Scaling & HA Troubleshooting · Scaling Guide
Locked out of admin account	Reset Admin Password
TLS certificate errors with custom/internal CAs	Custom CA Store
Alembic migration errors or manual schema fixes	Database Migration

Questions?

This page is the condensed version. The full docs go much deeper. If you did not find what you needed:

Search the docs: use the search box at the top of any page. There is a lot more in here than this overview covers.
Ask on GitHub Discussions: best for open-ended questions, feature discussions, and "how would I do X?" threads. Searchable and visible to future users who hit the same thing.
Ask on Discord: the most active community. Try the #questions channel; there is also an experimental bot there with full docs and issue context that can answer most questions in a few seconds.
Ask on Reddit: good for broader discussion, deployment stories, and community showcases.
Report a bug: only after you have confirmed it is a bug (reproducible, latest version, template filled in). "It doesn't work" issues get closed; "here's the exact repro, here are the logs" issues get fixed.

This content is for informational purposes only and does not constitute a warranty, guarantee, or contractual commitment. Open WebUI is provided "as is." See your license for applicable terms.

Tool calling​

Understanding tool-calling modes​

Choosing tools to enable​

Plugins​

Installing plugins​

Task models​

Context management​

Basic RAG​

Recommended starting config​

Open Terminal​

What to do next​

Troubleshooting​

Questions?​

Tool calling

Understanding tool-calling modes

Choosing tools to enable

Plugins

Installing plugins

Task models

Context management

Basic RAG

Recommended starting config

Open Terminal

What to do next

Troubleshooting

Questions?