Docling Document Extraction
This tutorial is a community contribution and is not supported by the Open WebUI team. It serves only as a demonstration on how to customize Open WebUI for your specific use case. Want to contribute? Check out the contributing tutorial.
π€ Docling Document Extractionβ
This documentation provides a step-by-step guide to integrating Docling with Open WebUI. Docling is a document processing library designed to transform a wide range of file formatsβincluding PDFs, Word documents, spreadsheets, HTML, and imagesβinto structured data such as JSON or Markdown. With built-in support for layout detection, table parsing, and language-aware processing, Docling streamlines document preparation for AI applications like search, summarization, and retrieval-augmented generation, all through a unified and extensible interface.
Prerequisitesβ
- Open WebUI instance
- Docker installed on your system
- Docker network set up for Open WebUI
Integration Steps
Step 1: Run the Docker Command for Docling-Serveβ
docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve
*With GPU support:
docker run --gpus all -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve-cu124
Step 2: Configure Open WebUI to use Doclingβ
- Log in to your Open WebUI instance.
- Navigate to the
Admin Panelsettings menu. - Click on
Settings. - Click on the
Documentstab. - Change the
Defaultcontent extraction engine dropdown toDocling. - Update the context extraction engine URL to
http://host.docker.internal:5001. - Save the changes.
(optional) Step 3: Configure Docling's picture description featuresβ
- on the
Documentstab: - Activate
Describe Pictures in Documentsbutton. - Below, choose a description mode:
localorAPIlocal: vision model will run in the same context as Docling itselfAPI: Docling will make a call to an external service/container (i.e. Ollama)
- fill in an object value as described at https://github.com/docling-project/docling-serve/blob/main/docs/usage.md#picture-description-options
- Save the changes.
Make sure the object value is a valid JSON! Working examples belowβ
{
"repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
"generation_config": {
"max_new_tokens": 200,
"do_sample": false
},
"prompt": "Describe this image in a few sentences."
}
{
"url": "http://localhost:11434/v1/chat/completions",
"params": {
"model": "qwen2.5vl:7b-q4_K_M"
},
"timeout": 60,
"prompt": "Describe this image in great details. "
}
Verifying Docling in Dockerβ
To verify that Docling is working correctly in a Docker environment, you can follow these steps:
1. Start the Docling Docker Containerβ
First, ensure that the Docling Docker container is running. You can start it using the following command:
docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve
This command starts the Docling container and maps port 5001 from the container to port 5001 on your local machine.
2. Verify the Server is Runningβ
- Go to
http://127.0.0.1:5001/ui/ - The URL should lead to a UI to use Docling
3. Verify the Integrationβ
- You can try uploading some files via the UI and it should return output in MD format or your desired format