Skip to main content

Document Extraction in Open WebUI

Open WebUI provides powerful document extraction capabilities that allow you to process and analyze various types of documents within your RAG (Retrieval Augmented Generation) workflows. Document extraction is essential for transforming unstructured document content into structured data that can be effectively used by language models.

What is Document Extraction?​

Document extraction refers to the process of automatically identifying and extracting text and data from various file formats, including:

  • PDFs (both text-based and scanned)
  • Images containing text
  • Handwritten documents
  • And more

With proper document extraction, Open WebUI can help you:

  • Convert image-based documents to searchable text
  • Preserve document structure and layout information
  • Extract data in structured formats for further processing
  • Support multilingual content recognition

Available Extraction Methods​

Open WebUI supports multiple document extraction engines to accommodate different needs and document types. Each extraction method has its own strengths and is suitable for different scenarios.

Explore the documentation for each available extraction method to learn how to set it up and use it effectively with your Open WebUI instance.