Skip to main content

Essentials for New Users

So you've installed Open WebUI, connected a provider, and had your first conversation. Now what?

This page walks through the handful of things that make the difference between "a chat UI" and "a setup that actually works well day-to-day." Nothing here is required — you can ignore it and keep chatting — but if you've ever wondered "can Open WebUI do X?" the answer is almost certainly yes, with one of these pieces.

Work through in order, or jump to the part that matches your question:

  1. Plugins — what they are and when to install one
  2. Task models — the invisible model behind the UI, and the hidden costs of the default
  3. Context management — why long chats eventually error out
  4. Tool calling — letting the model do things, not just talk
  5. Basic RAG — chatting with your own documents
  6. Open Terminal — giving the model a real computer

🧩 Plugins: the extensibility story

Open WebUI is intentionally small at the core. Most of the "wow" things people show in demos — auto-translation, token/cost tracking, image generation buttons, custom post-processing, provider integrations that aren't OpenAI or Ollama — are plugins, not built-in features. Knowing the plugin landscape is the biggest single unlock for a new user.

There are two plugin families, and the name of the family tells you what it does:

FamilyWhat it doesWhere it runsExamples
ToolsGive the model new abilities ("call this function when you need X")In Open WebUI's process, invoked by the model during a chatLangfuse / OpenLit observability, Home Assistant control, arXiv / PubMed lookups, Wolfram Alpha, Jira / Linear ticket creation, SQL queries against your own DB
Functions — PipesAdd a new "model" to the model picker, backed by custom codeSameModel-routing pipes (cheap vs. expensive based on prompt), multi-step agent loops, proprietary corporate-LLM backends without an OpenAI-compatible endpoint
Functions — FiltersModify every request and/or response as it passes throughSame, on every chat turnContext trimming, PII scrubbing, token / cost counting, Langfuse tracing, response reformatting
Functions — ActionsAdd a button under each message that runs custom codeSame, when the user clicks"Regenerate follow-ups", "Translate reply", "Pin message", "Save to Knowledge"

How you install them: the Open WebUI Community site hosts the one-click catalog — pick one, click "Get", paste into Admin Panel → Functions (or Tools), flip it on, and configure its valves (the plugin's settings).

When to install one: when you think "it would be nice if Open WebUI did X." It almost certainly already does — via a plugin. Always browse the community site first — there are thousands of Tools, Filters, Actions, and Pipes already written, and the one you need is usually already there. Even if nothing matches exactly, the closest hit is usually ~20 lines off from what you want and you can fork it from the admin panel.

Reference reading:


🤖 Task models: the invisible model behind the UI

This is the highest-leverage change you can make right after install, because the default is silently costing you money, latency, and patience.

Every time Open WebUI needs a short bit of "thinking" for a UI feature — writing a chat title for the sidebar, generating tags, suggesting follow-up questions, powering the autocomplete in the prompt box — it calls a Task Model. By default that task model is whatever main model you're currently chatting with, which means:

  • Your expensive flagship model gets hit every time you open a new chat just to write "Groceries list."
  • On a slow local model, every keystroke feels laggy because autocomplete is blocking on a 30B-parameter model.
  • A reasoning model (o1, r1, Claude with extended thinking) spends 5 seconds thinking before producing the three-word title.

These costs are easy to miss because they happen in the background. Fix them first.

Fix: in Admin Panel → Settings → Interface, set a dedicated Task Model. Two fields, because the right choice depends on what your main chat model is:

  • Task Model (External) — used when you are chatting with a cloud model (OpenAI, Anthropic, etc.). Set this to a fast, cheap, non-reasoning cloud model like gpt-5-nano, gemini-2.5-flash-lite, or llama-3.1-8b-instant.
  • Task Model (Local) — used when you are chatting with a local model (Ollama, llama.cpp, vLLM). Set this to a tiny local model like qwen3:1b, gemma3:1b, or llama3.2:3b.

The main chat experience doesn't change. The background chores just stop dragging.

While you're in the Interface settings, if you are on a low-spec machine or simply don't want some of these features, you can also disable the chores entirely. Each one has both an admin toggle in the same page and an environment variable — use whichever fits your workflow:

ChoreAdmin toggle (Settings → Interface)Env var
Autocomplete (the big one — fires on every keystroke)Autocomplete GenerationENABLE_AUTOCOMPLETE_GENERATION=False
Follow-up suggestionsFollow-up GenerationENABLE_FOLLOW_UP_GENERATION=False
Chat title generationTitle GenerationENABLE_TITLE_GENERATION=False
Tag generationTags GenerationENABLE_TAGS_GENERATION=False

Autocomplete is the single biggest "make it snappy" toggle on weak hardware — it fires on every keystroke, so a slow task model turns the whole prompt box into molasses.

More detail: Performance & RAM → Dedicated Task Models.


🧠 Context management: why long chats break

After enough back-and-forth you will eventually see:

The prompt is too long: 207601, model maximum context length: 202751

This error does not come from Open WebUI — it comes from your model provider. Every time you send a new message, the entire conversation (system prompt + all previous turns + attached files + tool call results + your new message) is sent as the "prompt." When the sum exceeds the model's context window, the provider rejects the request.

Open WebUI intentionally does not ship a built-in trimmer, because:

  • Every model uses a different tokenizer (GPT ≠ Claude ≠ Gemini ≠ GLM ≠ Llama).
  • Every model has a different context window (8k → 1M).
  • Every deployment wants a different policy (trim by tokens, by turns, by message count, drop attachments first, summarize older messages, etc.).

There is no single correct answer. The supported approach is to install a filter Function that trims the conversation on your terms. Community filters for most of the common policies already exist and can be installed with one click; if none fits, the code is short enough to copy and adapt.

➡️ Full guide, including a minimal "newest N turns" filter you can paste into your instance: Troubleshooting → Context Window / Prompt Too Long.


🔧 Tool calling: letting the model do things

Tool calling is what turns an LLM from "a very smart text box" into "an assistant that can actually go look things up, run code, and take actions." You attach a Tool to your chat (or your model), and the model decides — mid-response — when to call it, with what arguments. Open WebUI runs it, returns the result, and the model continues.

Two things every new user should configure:

1. Turn on Native tool calling

Open WebUI has two tool-calling modes in the UI: Native and Default. Default is legacy and no longer supported — it is kept in the dropdown only so existing deployments keep running during migration. All models should be configured to use Native Mode. Native is faster, preserves KV cache, supports built-in system tools (Memory, Notes, Knowledge, Web Search, Image Gen, Code Interpreter), and is the only mode that receives feature work going forward.

Every mainstream model supports it — OpenAI, Anthropic, Gemini, Llama 3.1+, Qwen 2.5+, DeepSeek, GLM, and essentially any other current model. Turn it on:

  • Best — once, for every model: in Admin Panel → Settings → Models, click the ⚙️ gear at the top right of the models list. That opens global model parameters — set Function Calling = Native there, save, and every current and future model in your instance inherits it. No per-model click-through required.
  • Per-model override: Admin Panel → Settings → Models → [your model] → Advanced Params → Function Calling = Native
  • Per-chat override: in a chat's Chat Controls (right sidebar)

If a tool "isn't being called" on a capable model, 90% of the time Native Mode just needs flipping on. If a specific small model struggles with Native Mode, the fix is to use a stronger model for tool-using conversations — not to fall back to Default Mode.

2. Install a few Tools

A lot of what people reach for — web search, code execution, image generation, memory, knowledge-base retrieval — is already built in and doesn't need a community plugin. Turn those on in Admin Panel → Settings (Web Search, Code Interpreter, Images, etc.) and attach them to your models. You'll get them injected automatically as built-in system tools in Native Mode.

Tools from the community site are for everything not built in. Good examples to explore:

  • Observability / cost tracking — Langfuse, OpenLit, Portkey tools that log every chat turn, token usage, and latency to your own observability stack. Essential once more than a handful of people use your instance.
  • Smart-home / automation integrations — Home Assistant tools that let the model actually control devices, routines, and scenes from a conversation.
  • Research lookups — arXiv, PubMed, Semantic Scholar, Wolfram Alpha — the model gets structured results it couldn't recall from training data, with real citations.
  • Issue / ticket / messaging integrations — create Jira / Linear / GitHub issues, post to Slack or Discord, send an email — the model stops being a read-only assistant.
  • Database / API tools — expose a read-only SQL query tool against your own database, or a tool that hits your internal API — the model starts answering questions grounded in your real data.
  • Domain tools — weather, stocks, time, crypto prices, shipping-tracking, recipe APIs, whatever matches your work.

Tools show up in the + menu in the chat input. Enable the ones you want for a given chat; the model only sees the tools you've enabled.

Seriously — browse the community site

The list above is a tiny sample of what's out there. The Open WebUI Community has thousands of community-built Tools, Filters, Actions, and Pipes covering use cases nobody on the core team would have thought of. Before you write anything yourself, browse the community site — sort by popularity, filter by category, and skim a few pages. You'll almost always find something that does exactly what you need, or is two lines off. One-click install, configure the valves, done.

This is the single biggest reason to treat Open WebUI as a platform rather than an app. The community is the feature set.

More detail:


📚 Basic RAG: chatting with your own documents

RAG (Retrieval-Augmented Generation) is the feature that lets you say "Here's a 400-page PDF, answer my questions about it" without the model having to (or being able to) read the whole thing every turn. Open WebUI splits your documents into chunks, embeds them as vectors, stores them in a vector DB, and at chat time retrieves just the relevant bits and passes those to the model.

Two ways to use it, in order of simplicity:

  1. One-off attachments. Drag a file into any chat input and ask questions. The file is chunked and embedded just for that chat.
  2. Knowledge bases. For documents you want to reuse across many chats (company handbook, codebase, research library, user manual), go to Workspace → Knowledge and create a knowledge base. You can then attach the entire knowledge base to a chat (via the # shortcut in the input), or bind it to a model in Workspace → Models so that model always has it available.

The defaults are reasonable for getting started. When you outgrow them, there are three knobs that matter most:

  • Embedding engine. The default (SentenceTransformers all-MiniLM-L6-v2) runs locally on CPU and consumes ~500 MB of RAM per worker. For any multi-user deployment, point at an external embeddings API (OpenAI, or Ollama with nomic-embed-text) via RAG_EMBEDDING_ENGINE.
  • Content extraction engine. The default uses pypdf, which leaks memory during heavy ingestion. For anything beyond casual use, switch to Tika or Docling via CONTENT_EXTRACTION_ENGINE.
  • Vector database. The default ChromaDB (local SQLite-backed) does not tolerate multi-worker deployments. At scale, use Milvus, Qdrant, or PGVector.

None of these matter for "a single user with a handful of PDFs." All of them start mattering the moment you have 100 documents or 10 concurrent users.

More detail:


⚡ Open Terminal: giving the model a real computer

If "run Python" is too restrictive and you want the model to actually work on your machine — clone repos, install packages, run test suites, spin up a local preview of a website, iterate on a data report against a real CSV — that's Open Terminal. It connects a real shell (sandboxed in a Docker container by default, or bare-metal if you want) as a tool the model can call the same way it calls any other tool. In-chat file browser, live web previews, and skill definitions are included.

This is the biggest "aha" feature once you get past basic chat. It turns Open WebUI from a chat UI into a place where the model actually builds things for you. If Native Mode is on and you've given the model a capable terminal, ask it to build you a small app or run an analysis on a folder of files and watch it go.

More detail:


What to do next

You don't need all of the above on day one. A reasonable order for a new install:

  1. Day one: pick a good default model, have a few conversations, get a feel for the UI.
  2. First thing after that: set a Task Model and decide which background chores you actually want enabled. This is the single biggest "feels better" change you can make, and it directly addresses hidden per-chat costs.
  3. Within the first week: turn on Native Mode globally and install one or two Tools that match your work.
  4. When you hit it: install a context filter the first time you see "prompt is too long."
  5. When you need it: set up a Knowledge base the first time you want to ask questions across multiple documents.
  6. When you're ready to go big: point the model at Open Terminal and let it actually build things for you.
  7. When you scale up: revisit the RAG infrastructure section if you go beyond a single user.

Everything else — enterprise SSO, multi-replica HA, Redis scaling, observability — is in Advanced Topics and Troubleshooting when and if you need it.


Any questions? Think something's missing? Got stuck?

This page is the condensed version — the real docs go much deeper. If you didn't find what you needed, try these, roughly in order:

  • 🔎 Search the docs — use the search box at the top of any page. A lot more is in here than the Essentials overview covers.
  • 💬 Ask on GitHub Discussions — best for open-ended questions, feature discussions, and "how would I do X?" threads. Searchable and visible to future users who hit the same thing.
  • 🎮 Ask on Discord — most active community. Try the #questions channel; there's also an experimental bot there with full docs + issue context that can answer most questions in a few seconds.
  • 👽 Ask on Reddit — good for broader discussion, deployment stories, and community showcases.
  • 🐛 Report a bug — only after you've confirmed it's a bug (reproducible, latest version, template filled in). "It doesn't work" issues get closed; "here's the exact repro, here are the logs" issues get fixed.

Welcome aboard. 👋

This content is for informational purposes only and does not constitute a warranty, guarantee, or contractual commitment. Open WebUI is provided "as is." See your license for applicable terms.