Essentials for Open WebUI
You have installed Open WebUI, connected a provider, and had your first conversation. This page covers the six things that turn a basic chat UI into a setup that works well day-to-day. None of them are required, but most users end up reaching for all of them within the first week.
Work through in order, or jump to the section you need:
If you are setting up Open WebUI for multiple users, also read the Scaling Open WebUI guide. It covers infrastructure decisions (PostgreSQL, Redis, external vector databases, shared storage) that are separate from the feature-level essentials on this page. The two guides are additive โ work through the essentials here for day-to-day usage, and the scaling guide for multi-user infrastructure.
Tool callingโ
Without tools, an LLM can only generate text. With tools, it can look things up, run code, and take actions on your behalf. You attach a Tool to a chat (or to a model), and the model decides during its response when to call it and with what arguments. Open WebUI executes the call, feeds the result back, and the model continues with that new information.
Tool calling is what turns a chat model into an agent. Getting this set up unlocks most of the advanced features covered in the rest of this page.
There are two things worth understanding early on.
Understanding tool-calling modesโ
Open WebUI has two ways of connecting a model to its tools, and a common source of confusion is that the one labeled "Default" is not the one most people should be using. The naming makes more sense once you understand the history and Open WebUI's design philosophy.
Default Mode is the legacy approach and exists for broad compatibility. If your model supports function calling (most modern models do), Native Mode is the better choice. See below for how to switch.
A core goal of Open WebUI is to support the widest possible range of models, from cutting-edge frontier APIs to tiny quantized models running on a Raspberry Pi. When Open WebUI first introduced tool calling, most models did not have built-in function-calling APIs. The only way to give a model tools was to describe them in the system prompt as plain text and then parse the model's response to figure out which tool it was trying to call. This prompt-based approach is what Open WebUI calls Default Mode. It was the original (and at the time, only) implementation, and it is now considered legacy handling. It remains the system default because it is the lowest common denominator: any model that can follow instructions in a system prompt can use it, no special API support required.
Since then, most model providers have added native function-calling support. Open WebUI calls this Native Mode. The model receives tool definitions as a structured part of its API request and returns structured tool calls in its response. It is faster, more reliable, preserves KV cache, and is required for all of Open WebUI's built-in system tools (Memory, Notes, Knowledge, Web Search, Image Gen, Code Interpreter). All new tool-calling features are built for Native Mode. But it only works with models that actually expose a function-calling API, which is why it cannot be the universal default.
In practice, most users should enable Native Mode. Every major current model supports it (OpenAI, Anthropic, Gemini, Llama 3.1+, Qwen 2.5+, DeepSeek, GLM, and others). You can set it once for your entire instance:
Admin Panel > Settings > Models โ click the Settings button at the top right of the models list โ set Function Calling = Native โ save. Every current and future model inherits the setting.
- Per model: Admin Panel > Settings > Models > [your model] > Model Parameters > Function Calling = Native
- Per chat: In the chat's Chat Controls (right sidebar)
If you are running an older local model or a fine-tune that does not expose a function-calling API, keep that specific model on Default Mode. For everything else, Native Mode is the better choice.
Choosing tools to enableโ
Many of the tools people look for are already built into Open WebUI and just need to be turned on: web search, code execution, image generation, memory, and knowledge-base retrieval are all available without installing any plugins. Once enabled, these appear automatically as system tools when using Native Mode.
Most of these need a small amount of setup (choosing a provider, adding an API key, or enabling a toggle). Setup guides for the most popular ones:
- Web Search โ connect a search provider (Google, Brave, DuckDuckGo, SearXNG and many more) so the model can look things up
- Image Generation โ connect an image provider (OpenAI DALL-E, ComfyUI, Automatic1111, etc.) for in-chat image creation
- Code Execution โ run code blocks directly in chat (Pyodide runs in-browser by default, or connect Jupyter for server-side execution)
- Memory โ let the model remember facts about you across conversations
For anything not built in, the Open WebUI Community site is worth browsing. A few categories to give a sense of what is available:
- Observability / cost tracking: Langfuse, OpenLit, Portkey. Log every chat turn, token usage, and latency to your own stack.
- Smart-home / automation: Home Assistant tools that let the model control devices, routines, and scenes.
- Research: arXiv, PubMed, Semantic Scholar, Wolfram Alpha. Structured results with real citations.
- Issue tracking / messaging: Jira, Linear, GitHub Issues, Slack, Discord, email.
- Databases / APIs: read-only SQL against your own database, or calls to your internal API.
- Domain-specific: weather, stocks, crypto, shipping tracking, recipes, and many more.
Tools appear in the + menu in the chat input. The model only sees the tools you have enabled for that conversation.
The Open WebUI Community hosts thousands of community-built plugins. Before writing anything yourself, browse what is already there: sort by popularity, filter by category, and skim a few pages. Most of the time, someone has already built what you need (or something close enough to fork).
More detail:
- Tools reference
- Tool-calling modes (Default vs Native)
- Open WebUI Community: Tools, Functions, Models
Pluginsโ
Open WebUI ships with a lot out of the box, but its real power is that it is designed to be extended. Many of the advanced capabilities people show in demos (auto-translation, token/cost tracking, custom post-processing, niche provider integrations) are plugins built on top of the platform. Understanding the plugin landscape is the single biggest unlock for a new user.
There are two plugin families: Tools and Functions.
Tools give the model abilities it can call during a response:
| Source | What it does | Examples |
|---|---|---|
| Built-in | System tools that ship with Open WebUI. Enable in the admin panel, no install needed. | Web Search, Code Interpreter, Image Generation, Memory, Notes, Knowledge retrieval |
| Custom | ||
| ย ย โณ Tool | Code you write yourself or install from the community site. Manage in Workspace > Tools. | Langfuse / OpenLit observability, Home Assistant, arXiv / PubMed lookups, Wolfram Alpha, Jira / Linear, SQL queries |
| ย ย โณ Tool server | External services connected via MCP or OpenAPI. Configure in Admin Panel > Settings > Tools. | Your own microservices, third-party APIs, existing MCP servers |
Functions run at the platform level and modify how Open WebUI itself behaves. There are three types:
| Type | What it does | Examples |
|---|---|---|
| Pipes | Add a new "model" to the model picker, backed by custom code | Model-routing (cheap vs. expensive based on prompt), multi-step agent loops, custom LLM backends |
| Filters | Modify every request and/or response as it passes through, automatically on every chat turn | Context trimming, PII scrubbing, token / cost counting, Langfuse tracing, response reformatting |
| Actions | Add a button under each message that runs custom code when the user clicks it | "Regenerate follow-ups", "Translate reply", "Pin message", "Save to Knowledge" |
Both Tools and Functions can be browsed and installed from the Open WebUI Community site, which hosts thousands of community-built plugins. You can also write your own from scratch in the admin panel.
Installing pluginsโ
The Open WebUI Community site hosts the one-click catalog for both Tools and Functions. Pick one, click "Get", paste it into the admin panel, enable it, and configure its valves (the plugin's settings).
Whenever you think "it would be nice if Open WebUI did X," it almost certainly already does via a plugin. There are thousands of plugins already written, and the one you need is usually already there. Even if nothing matches exactly, the closest hit is usually only about 20 lines off from what you want and you can fork it from the admin panel.
Reference reading:
Task modelsโ
By default, background tasks (titles, tags, autocomplete) use your main chat model. Setting a dedicated task model is the easiest way to improve speed and reduce unnecessary API costs.
Every time Open WebUI needs a short piece of "thinking" for a UI feature (writing a chat title for the sidebar, generating tags, suggesting follow-up questions, powering the autocomplete in the prompt box) it calls a Task Model. By default that task model is whatever main model you are currently chatting with, which means:
- Your expensive flagship model gets invoked every time you open a new chat just to write "Groceries list."
- On a slow local model, every keystroke feels laggy because autocomplete is waiting on a 30B-parameter model.
- A reasoning model (o1, r1, Claude with extended thinking) spends five seconds thinking before producing a three-word title.
These run in the background, so they are easy to overlook. A dedicated task model is a small change that makes a noticeable difference.
Fix: In Admin Panel > Settings > Interface, set a dedicated Task Model. There are two fields, because the right choice depends on what your main chat model is:
- Task Model (External): Set to a fast, cheap, non-reasoning cloud model like
gpt-5-nano,gemini-2.5-flash-lite, orllama-3.1-8b-instant. - Task Model (Local): Set to a tiny local model like
qwen3:1b,gemma3:1b, orllama3.2:3b.
The main chat experience does not change. The background chores just stop dragging.
While you are in the Interface settings, you can also disable these chores entirely if you are on a low-spec machine or simply do not want them. Each one has both an admin toggle in the same page and an environment variable:
| Chore | Admin toggle (Settings > Interface) | Env var |
|---|---|---|
| Autocomplete (fires on every keystroke) | Autocomplete Generation | ENABLE_AUTOCOMPLETE_GENERATION=False |
| Follow-up suggestions | Follow-up Generation | ENABLE_FOLLOW_UP_GENERATION=False |
| Chat title generation | Title Generation | ENABLE_TITLE_GENERATION=False |
| Tag generation | Tags Generation | ENABLE_TAGS_GENERATION=False |
Autocomplete is the single biggest "make it snappy" toggle on weak hardware. It fires on every keystroke, so a slow task model turns the whole prompt box into molasses. Disable it first if the UI feels sluggish.
More detail: Performance & RAM: Dedicated Task Models.
Context managementโ
After enough back-and-forth you will eventually see:
The prompt is too long: 207601, model maximum context length: 202751
This error comes from your model provider, not from Open WebUI. Every time you send a message, the entire conversation (system prompt, all previous turns, attached files, tool call results, and your new message) is sent as the "prompt." When the sum exceeds the model's context window, the provider rejects the request.
Open WebUI intentionally does not ship a built-in trimmer, because:
- Every model uses a different tokenizer (GPT, Claude, Gemini, GLM, Llama all differ).
- Every model has a different context window (8k to 1M+).
- Every deployment wants a different policy (trim by tokens, by turns, by message count, drop attachments first, summarize older messages, etc.).
There is no single correct answer. The supported approach is to install a filter Function that trims the conversation on your terms.
Community filters for most common policies already exist and can be installed with one click. If none fits, the code is short enough to copy and adapt. See the full guide including a minimal "newest N turns" filter: Troubleshooting: Context Window / Prompt Too Long.
Basic RAGโ
RAG (Retrieval-Augmented Generation) is the feature that lets you say "Here's a 400-page PDF, answer my questions about it" without the model having to read the whole thing every turn. Open WebUI splits your documents into chunks, embeds them as vectors, stores them in a vector database, and at chat time retrieves just the relevant pieces to pass to the model.
Two ways to use it, in order of simplicity:
- One-off attachments. Drag a file into any chat input and ask questions. The file is chunked and embedded just for that chat.
- Knowledge bases. For documents you want to reuse across many chats (company handbook, codebase, research library, user manual), go to Workspace > Knowledge and create a knowledge base. You can then attach the entire knowledge base to a chat (via the
#shortcut in the input), or bind it to a model in Workspace > Models so that model always has it available.
The defaults are reasonable for getting started. When you outgrow them, there are three knobs that matter most:
- Embedding engine. The default (SentenceTransformers
all-MiniLM-L6-v2) runs locally on CPU and consumes roughly 500 MB of RAM per worker. For any multi-user deployment, point at an external embeddings API (OpenAI, or Ollama withnomic-embed-text) viaRAG_EMBEDDING_ENGINE. - Content extraction engine. The default uses
pypdf, which leaks memory during heavy ingestion. For anything beyond casual use, switch to Tika or Docling viaCONTENT_EXTRACTION_ENGINE. - Vector database. The default ChromaDB (local SQLite-backed) does not tolerate multi-worker deployments. At scale, switch to PGVector โ it is the only vector database officially supported and maintained by the Open WebUI team. Milvus, Qdrant, and MariaDB Vector are also available but are community-maintained: they may break on upgrades and fixes depend on community contributions. See the env-configuration reference for setup and the community disclaimers on each provider.
None of these matter for "a single user with a handful of PDFs." All of them start mattering the moment you have 100 documents or 10 concurrent users.
Recommended starting configโ
If you just want RAG to work well out of the box, these settings are a solid general-purpose starting point. They are not fine-tuned for every use case, but they will produce noticeably better results than the defaults for most document types.
Set these in Admin Panel > Settings > Documents:
| Setting | Recommended value | Default | Why |
|---|---|---|---|
| Text Splitter | token | character | Token-based splitting produces more consistent chunk sizes across document types |
| Markdown Header Splitting | On | On | Respects document structure by splitting at headings, keeping sections coherent |
| Chunk Size | 2000 | 1000 | Larger chunks preserve more surrounding context per retrieval hit |
| Chunk Overlap | 200 | 100 | More overlap means less chance of cutting a key sentence in half |
| Top K | 15 | 3 | Retrieves more candidate chunks, giving the model a wider pool of relevant context. If you are working with local models that have constrained context sizes, lower this to 5 to avoid filling the context window with retrieved chunks |
| Embedding Model | External (OpenAI or Ollama) | all-MiniLM-L6-v2 (local CPU) | The default works for a single user but consumes ~500 MB RAM per worker. For any multi-user setup, use an external embedding API instead |
The default SentenceTransformers model runs locally on CPU and is fine for a single user getting started. For anything beyond that, point at an external embeddings API: set RAG_EMBEDDING_ENGINE=openai with an OpenAI API key, or RAG_EMBEDDING_ENGINE=ollama with any Ollama embedding model (e.g., nomic-embed-text). This offloads the work and frees significant RAM.
More detail:
- RAG overview
- Knowledge workspace
- Performance tuning for RAG
- Scaling: external vector database โ required for multi-worker and multi-replica deployments
- Scaling: content extraction & embeddings โ fixing memory leaks at scale
Open Terminalโ
If "run Python" is too restrictive and you want the model to actually work on your machine (clone repos, install packages, run test suites, spin up a local preview of a website, iterate on a data report against a real CSV), that is what Open Terminal is for. It connects a real shell (sandboxed in a Docker container by default, or bare-metal if you want) as a tool the model can call the same way it calls any other tool. In-chat file browser, live web previews, and skill definitions are included.
This is the biggest "aha" feature once you get past basic chat. It turns Open WebUI from a chat UI into a place where the model actually builds things for you. If Native Mode is on and you have given the model a capable terminal, ask it to build you a small app or run an analysis on a folder of files and watch it go.
More detail:
- Open Terminal: give your AI a real computer
- Use cases: software development, data reports, app builder, research assistant, and more
What to do nextโ
You do not need all of the above on day one. A reasonable order for a new install:
- Day one: Pick a good default model, have a few conversations, get a feel for the UI.
- First thing after that: Set a Task Model and decide which background chores you actually want enabled. This is the single biggest "feels better" change you can make, and it directly addresses hidden per-chat costs.
- Within the first week: Turn on Native Mode globally and install one or two Tools that match your work.
- When you hit it: Install a context filter the first time you see "prompt is too long."
- When you need it: Set up a Knowledge base the first time you want to ask questions across multiple documents.
- When you are ready to go big: Point the model at Open Terminal and let it actually build things for you.
- When you scale up: Revisit the RAG infrastructure section if you go beyond a single user.
Everything else (enterprise SSO, multi-replica HA, Redis scaling, observability) is in Advanced Topics and Troubleshooting when and if you need it.
Troubleshootingโ
When something goes wrong, start here:
| Having problems with... | Read this |
|---|---|
| Connection refused, 401 errors, CORS failures, WebSocket disconnects | Connection Errors |
| "Prompt is too long" or context window exceeded | Context Window / Prompt Too Long |
| RAG not returning relevant results, uploads failing, knowledge base issues | RAG Troubleshooting |
| Web search not working or returning poor results | Web Search Troubleshooting |
| Image generation errors or provider setup | Image Generation Troubleshooting |
| Speech-to-text, text-to-speech, or audio playback | Audio Troubleshooting |
| SSO, OAuth, or LDAP login issues | SSO & OAuth Troubleshooting |
| High memory usage, slow responses, or worker crashes | Performance & RAM ยท Scaling Guide |
| Login loops, config drift, or database locks in multi-replica setups | Scaling & HA Troubleshooting ยท Scaling Guide |
| Locked out of admin account | Reset Admin Password |
| TLS certificate errors with custom/internal CAs | Custom CA Store |
| Alembic migration errors or manual schema fixes | Database Migration |
Questions?โ
This page is the condensed version. The full docs go much deeper. If you did not find what you needed:
- Search the docs: use the search box at the top of any page. There is a lot more in here than this overview covers.
- Ask on GitHub Discussions: best for open-ended questions, feature discussions, and "how would I do X?" threads. Searchable and visible to future users who hit the same thing.
- Ask on Discord: the most active community. Try the
#questionschannel; there is also an experimental bot there with full docs and issue context that can answer most questions in a few seconds. - Ask on Reddit: good for broader discussion, deployment stories, and community showcases.
- Report a bug: only after you have confirmed it is a bug (reproducible, latest version, template filled in). "It doesn't work" issues get closed; "here's the exact repro, here are the logs" issues get fixed.