Scalable Enterprise Deployment Options
Open WebUI's stateless, container-first architecture means the same application runs identically whether you deploy it as a Python process on a VM, a container in a managed service, or a pod in a Kubernetes cluster. The difference between deployment patterns is how you orchestrate, scale, and operate the application — not how the application itself behaves.
How you serve LLM models is separate from how you deploy Open WebUI. You can use managed APIs (OpenAI, Anthropic, Azure OpenAI, Google Gemini) or self-hosted inference (Ollama, vLLM) with any deployment pattern. See Integration for details on connecting models.
Shared Infrastructure Requirements
Regardless of which deployment pattern you choose, every scaled Open WebUI deployment requires the same set of backing services. Configure these before scaling beyond a single instance.
| Component | Why It's Required | Options |
|---|---|---|
| PostgreSQL | Multi-instance deployments require a real database. SQLite does not support concurrent writes from multiple processes. | Self-managed, Amazon RDS, Azure Database for PostgreSQL, Google Cloud SQL |
| Redis | Session management, WebSocket coordination, and configuration sync across instances. | Self-managed, Amazon ElastiCache, Azure Cache for Redis, Google Memorystore |
| Vector Database | The default ChromaDB uses a local SQLite backend that is not safe for multi-process access. | PGVector (shares PostgreSQL), Milvus, Qdrant, or ChromaDB in HTTP server mode |
| Shared Storage | Uploaded files must be accessible from every instance. | Shared filesystem (NFS, EFS, CephFS) or object storage (S3, GCS, Azure Blob) |
| Content Extraction | The default pypdf extractor leaks memory under sustained load. | Apache Tika or Docling as a sidecar service |
| Embedding Engine | The default SentenceTransformers model loads ~500 MB into RAM per worker process. | OpenAI Embeddings API, or Ollama running an embedding model |
Critical Configuration
These environment variables must be set consistently across every instance:
# Shared secret — MUST be identical on all instances
WEBUI_SECRET_KEY=your-secret-key-here
# Database
DATABASE_URL=postgresql://user:password@db-host:5432/openwebui
# Vector Database
VECTOR_DB=pgvector
PGVECTOR_DB_URL=postgresql://user:password@db-host:5432/openwebui
# Redis
REDIS_URL=redis://redis-host:6379/0
WEBSOCKET_MANAGER=redis
ENABLE_WEBSOCKET_SUPPORT=true
# Content Extraction
CONTENT_EXTRACTION_ENGINE=tika
TIKA_SERVER_URL=http://tika:9998
# Embeddings
RAG_EMBEDDING_ENGINE=openai
# Storage — choose ONE:
# Option A: shared filesystem (mount the same volume to all instances, no env var needed)
# Option B: object storage (see https://docs.openwebui.com/reference/env-configuration#cloud-storage for all required vars)
# STORAGE_PROVIDER=s3
# Workers — let the orchestrator handle scaling
UVICORN_WORKERS=1
# Migrations — only ONE instance should run migrations
ENABLE_DB_MIGRATIONS=false
Set ENABLE_DB_MIGRATIONS=false on all instances except one. During updates, scale down to a single instance, allow migrations to complete, then scale back up. Concurrent migrations can corrupt your database.
For the complete step-by-step scaling walkthrough, see Scaling Open WebUI. For the full environment variable reference, see Environment Variable Configuration.
Choose Your Deployment Pattern
Open WebUI supports three production deployment patterns. Each guide covers architecture, scaling strategy, and key considerations specific to that approach.
Python / Pip on Auto-Scaling VMs
Deploy open-webui serve as a systemd-managed process on virtual machines in a cloud auto-scaling group (AWS ASG, Azure VMSS, GCP MIG). Best for teams with established VM-based infrastructure and strong Linux administration skills, or when regulatory requirements mandate direct OS-level control.
Container Service
Run the official Open WebUI container image on a managed platform such as AWS ECS/Fargate, Azure Container Apps, or Google Cloud Run. Best for teams wanting container benefits — immutable images, versioned deployments, no OS management — without Kubernetes complexity.
Kubernetes with Helm
Deploy using the official Open WebUI Helm chart on any Kubernetes distribution (EKS, AKS, GKE, OpenShift, Rancher, self-managed). Best for large-scale, mission-critical deployments requiring declarative infrastructure-as-code, advanced auto-scaling, and GitOps workflows.
Deployment Comparison
| Python / Pip (VMs) | Container Service | Kubernetes (Helm) | |
|---|---|---|---|
| Operational complexity | Moderate — OS patching, Python management | Low — platform-managed containers | Higher — requires K8s expertise |
| Auto-scaling | Cloud ASG/VMSS with health checks | Platform-native, minimal configuration | HPA with fine-grained control |
| Container isolation | None — process runs directly on OS | Full container isolation | Full container + namespace isolation |
| Rolling updates | Manual (scale down, update, scale up) | Platform-managed rolling deployments | Declarative rolling updates with rollback |
| Infrastructure-as-code | Terraform/Pulumi for VMs + config mgmt | Task/service definitions (CloudFormation, Bicep, Terraform) | Helm charts + GitOps (Argo CD, Flux) |
| Best suited for | Teams with VM-centric operations, regulatory constraints | Teams wanting container benefits without K8s complexity | Large-scale, mission-critical deployments |
| Minimum team expertise | Linux administration, Python | Container fundamentals, cloud platform | Kubernetes, Helm, cloud-native patterns |
Observability
Production deployments should include monitoring and observability regardless of deployment pattern.
Health Checks
/health— Basic liveness check. Returns HTTP 200 when the application is running. Use this for load balancer and auto-scaler health checks./api/models— Verifies the application can connect to configured model backends. Requires an API key.
OpenTelemetry
Open WebUI supports OpenTelemetry for distributed tracing and HTTP metrics. Enable it with:
ENABLE_OTEL=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318
OTEL_SERVICE_NAME=open-webui
This auto-instruments FastAPI, SQLAlchemy, Redis, and HTTP clients — giving visibility into request latency, database query performance, and cross-service traces.
Structured Logging
Enable JSON-formatted logs for integration with log aggregation platforms (Datadog, Loki, CloudWatch, Splunk):
LOG_FORMAT=json
GLOBAL_LOG_LEVEL=INFO
For full monitoring setup details, see Monitoring and OpenTelemetry.
Next Steps
- Architecture & High Availability — Deeper dive into Open WebUI's stateless design and HA capabilities.
- Security — Compliance frameworks, SSO/LDAP integration, RBAC, and audit logging.
- Integration — Connecting AI models, pipelines, and extending functionality.
- Scaling Open WebUI — The complete step-by-step technical scaling guide.
- Multi-Replica Troubleshooting — Solutions for common issues in scaled deployments.
Need help planning your enterprise deployment? Our team works with organizations worldwide to design and implement production Open WebUI environments.