Skip to main content

Scalable Enterprise Deployment Options

Open WebUI's stateless, container-first architecture means the same application runs identically whether you deploy it as a Python process on a VM, a container in a managed service, or a pod in a Kubernetes cluster. The difference between deployment patterns is how you orchestrate, scale, and operate the application — not how the application itself behaves.

Model Inference Is Independent

How you serve LLM models is separate from how you deploy Open WebUI. You can use managed APIs (OpenAI, Anthropic, Azure OpenAI, Google Gemini) or self-hosted inference (Ollama, vLLM) with any deployment pattern. See Integration for details on connecting models.


Shared Infrastructure Requirements

Regardless of which deployment pattern you choose, every scaled Open WebUI deployment requires the same set of backing services. Configure these before scaling beyond a single instance.

ComponentWhy It's RequiredOptions
PostgreSQLMulti-instance deployments require a real database. SQLite does not support concurrent writes from multiple processes.Self-managed, Amazon RDS, Azure Database for PostgreSQL, Google Cloud SQL
RedisSession management, WebSocket coordination, and configuration sync across instances.Self-managed, Amazon ElastiCache, Azure Cache for Redis, Google Memorystore
Vector DatabaseThe default ChromaDB uses a local SQLite backend that is not safe for multi-process access.PGVector (shares PostgreSQL), Milvus, Qdrant, or ChromaDB in HTTP server mode
Shared StorageUploaded files must be accessible from every instance.Shared filesystem (NFS, EFS, CephFS) or object storage (S3, GCS, Azure Blob)
Content ExtractionThe default pypdf extractor leaks memory under sustained load.Apache Tika or Docling as a sidecar service
Embedding EngineThe default SentenceTransformers model loads ~500 MB into RAM per worker process.OpenAI Embeddings API, or Ollama running an embedding model

Critical Configuration

These environment variables must be set consistently across every instance:

# Shared secret — MUST be identical on all instances
WEBUI_SECRET_KEY=your-secret-key-here

# Database
DATABASE_URL=postgresql://user:password@db-host:5432/openwebui

# Vector Database
VECTOR_DB=pgvector
PGVECTOR_DB_URL=postgresql://user:password@db-host:5432/openwebui

# Redis
REDIS_URL=redis://redis-host:6379/0
WEBSOCKET_MANAGER=redis
ENABLE_WEBSOCKET_SUPPORT=true

# Content Extraction
CONTENT_EXTRACTION_ENGINE=tika
TIKA_SERVER_URL=http://tika:9998

# Embeddings
RAG_EMBEDDING_ENGINE=openai

# Storage — choose ONE:
# Option A: shared filesystem (mount the same volume to all instances, no env var needed)
# Option B: object storage (see https://docs.openwebui.com/reference/env-configuration#cloud-storage for all required vars)
# STORAGE_PROVIDER=s3

# Workers — let the orchestrator handle scaling
UVICORN_WORKERS=1

# Migrations — only ONE instance should run migrations
ENABLE_DB_MIGRATIONS=false
Database Migrations

Set ENABLE_DB_MIGRATIONS=false on all instances except one. During updates, scale down to a single instance, allow migrations to complete, then scale back up. Concurrent migrations can corrupt your database.

For the complete step-by-step scaling walkthrough, see Scaling Open WebUI. For the full environment variable reference, see Environment Variable Configuration.


Choose Your Deployment Pattern

Open WebUI supports three production deployment patterns. Each guide covers architecture, scaling strategy, and key considerations specific to that approach.

Python / Pip on Auto-Scaling VMs

Deploy open-webui serve as a systemd-managed process on virtual machines in a cloud auto-scaling group (AWS ASG, Azure VMSS, GCP MIG). Best for teams with established VM-based infrastructure and strong Linux administration skills, or when regulatory requirements mandate direct OS-level control.

Container Service

Run the official Open WebUI container image on a managed platform such as AWS ECS/Fargate, Azure Container Apps, or Google Cloud Run. Best for teams wanting container benefits — immutable images, versioned deployments, no OS management — without Kubernetes complexity.

Kubernetes with Helm

Deploy using the official Open WebUI Helm chart on any Kubernetes distribution (EKS, AKS, GKE, OpenShift, Rancher, self-managed). Best for large-scale, mission-critical deployments requiring declarative infrastructure-as-code, advanced auto-scaling, and GitOps workflows.


Deployment Comparison

Python / Pip (VMs)Container ServiceKubernetes (Helm)
Operational complexityModerate — OS patching, Python managementLow — platform-managed containersHigher — requires K8s expertise
Auto-scalingCloud ASG/VMSS with health checksPlatform-native, minimal configurationHPA with fine-grained control
Container isolationNone — process runs directly on OSFull container isolationFull container + namespace isolation
Rolling updatesManual (scale down, update, scale up)Platform-managed rolling deploymentsDeclarative rolling updates with rollback
Infrastructure-as-codeTerraform/Pulumi for VMs + config mgmtTask/service definitions (CloudFormation, Bicep, Terraform)Helm charts + GitOps (Argo CD, Flux)
Best suited forTeams with VM-centric operations, regulatory constraintsTeams wanting container benefits without K8s complexityLarge-scale, mission-critical deployments
Minimum team expertiseLinux administration, PythonContainer fundamentals, cloud platformKubernetes, Helm, cloud-native patterns

Observability

Production deployments should include monitoring and observability regardless of deployment pattern.

Health Checks

  • /health — Basic liveness check. Returns HTTP 200 when the application is running. Use this for load balancer and auto-scaler health checks.
  • /api/models — Verifies the application can connect to configured model backends. Requires an API key.

OpenTelemetry

Open WebUI supports OpenTelemetry for distributed tracing and HTTP metrics. Enable it with:

ENABLE_OTEL=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318
OTEL_SERVICE_NAME=open-webui

This auto-instruments FastAPI, SQLAlchemy, Redis, and HTTP clients — giving visibility into request latency, database query performance, and cross-service traces.

Structured Logging

Enable JSON-formatted logs for integration with log aggregation platforms (Datadog, Loki, CloudWatch, Splunk):

LOG_FORMAT=json
GLOBAL_LOG_LEVEL=INFO

For full monitoring setup details, see Monitoring and OpenTelemetry.


Next Steps


Need help planning your enterprise deployment? Our team works with organizations worldwide to design and implement production Open WebUI environments.

Contact Enterprise Sales → sales@openwebui.com