OpenRAG Tutorial: 15-Minute Agentic Search Setup

OpenRAG, a new open-source RAG platform from IBM Research trending on GitHub today (905 stars gained, 2,234 total), goes from zero to production-grade agentic document search in 15 minutes with a single Docker Compose command. It combines three best-in-class tools—Langflow for visual workflow orchestration, Docling for intelligent document parsing, and OpenSearch for enterprise-grade semantic search—into a pre-configured, containerized solution that eliminates the days of DevOps work traditional RAG implementations require. For developers building enterprise document search, customer support knowledge bases, or compliance-friendly AI systems without vendor lock-in, OpenRAG delivers self-hosted RAG at deployment speed that previously required managed services.

What OpenRAG Is and How It Works

OpenRAG integrates three components into a cohesive platform. Langflow orchestrates agentic workflows and provides a drag-and-drop visual editor for customization without code changes. Docling handles intelligent document parsing, preserving tables, lists, and structural hierarchy that naive parsers destroy—critical for maintaining context accuracy in retrieval. OpenSearch powers enterprise-grade vector search and semantic retrieval at production scale.

The data flow is straightforward: Upload → Docling parses → Chunk (default 8,000 tokens with format-aware splitting) → Embed via configured model → OpenSearch indexes → Query → Agent retrieves and re-ranks passages → LLM generates answer → Display with source attribution. The agentic approach differentiates OpenRAG from basic RAG implementations. When users ask questions, the agent performs semantic search against the OpenSearch index, retrieves relevant passages, re-ranks them using advanced scoring, and passes top-ranked context to the LLM for answer generation. The entire chain is visible and editable through Langflow’s visual editor.

Related: RAG Document Poisoning: 250 Docs Breach Any AI Model

This architecture matters because developers gain full transparency and control over every component—how documents are parsed, how they’re chunked and indexed, how search operates, how the agent orchestrates retrieval, and how results are delivered. Everything is inspectable and modifiable, unlike black-box managed services.

Get Started in 15 Minutes

OpenRAG’s installation eliminates multi-day infrastructure projects. Requirements are minimal: 8 GB RAM, 50 GB disk space, and Docker or Podman VM. Two deployment options exist: docker-compose.yml for GPU-enabled systems (faster embedding generation) and docker-compose-cpu.yml for CPU-only deployments. The “15 minutes from zero to agentic search” promise is real—Docker Compose handles the heavy lifting.

Installation requires four commands:

# Prerequisites: Docker installed and running
git clone https://github.com/langflow-ai/openrag.git
cd openrag

# Start all services (single command)
docker compose up -d

# Access web interface
open http://localhost:3000

After containers start, access the web interface at localhost:3000, upload documents (PDFs, Word files, images, etc.), wait for Docling to parse and OpenSearch to index, then start querying immediately via the chat interface. Answers include source citations showing which documents informed the response. For detailed setup instructions, see the official OpenRAG quickstart guide.

Customization happens through Langflow’s visual editor—no code changes needed. Modify the ingestion pipeline (chunking strategy, embedding model), adjust retrieval logic (top-k, re-ranking algorithm), or add custom components (filters, validators, post-processors). Changes are drag-and-drop accessible, making RAG experimentation fast and visual.

When to Use OpenRAG

OpenRAG targets three primary use cases. Enterprise document search for internal knowledge bases (technical docs, policies, runbooks) where employees need to query across thousands of documents. Customer support knowledge bases using AI-powered Q&A to reduce ticket response time—Doordash, for example, uses a RAG-based chatbot for delivery support combining RAG with LLM guardrails. Compliance and regulatory search where organizations need data sovereignty and full transparency over document processing and AI reasoning.

The platform is particularly valuable for organizations with compliance requirements, data sovereignty concerns, or needing to understand every step of AI decision-making. Self-hosting keeps data on private infrastructure, and the visual workflow editor exposes the complete retrieval chain for auditing and validation.

OpenRAG vs Alternatives

OpenRAG sits between managed services and DIY approaches, offering pre-integrated components with self-hosted control. Pinecone delivers managed serverless scale with minimal ops but ongoing per-query costs and vendor lock-in—ideal for commercial AI SaaS where infrastructure management is unwanted. Weaviate provides just the vector database (bring your own parser and orchestrator) with GraphQL interfaces and hybrid search modules, offering more flexibility but requiring more integration work.

Chroma works for small-scale apps and prototyping but lacks production-grade scalability—excellent for MVPs, not billion-vector deployments. Custom DIY RAG gives total control over every component but requires weeks of integration work connecting document parsers, embedding generators, vector databases, and LLM orchestration.

Choose OpenRAG when you need self-hosted deployment for data sovereignty, rapid prototyping to production without managing individual components, or full transparency without vendor black boxes. Choose Pinecone for zero-ops SaaS with proven multi-region SLAs. Choose Weaviate when you need specific vector database features and have existing infrastructure. Choose Chroma for quick prototypes under 100K vectors. Build custom when you have very specific requirements and engineering resources to integrate components over weeks.

Best Practices and Common Pitfalls

Start with recursive chunking—Chroma research shows it delivers 85-90% recall at 400 tokens, while semantic chunking reaches 91-92% but with higher computational cost. Use 250-token chunks (approximately 1,000 characters) as your starting point for experimentation, adjusting based on document type and query patterns. Apply 10-20% overlap as industry standard, though recent January 2026 analysis using SPLADE retrieval found overlap provided no measurable benefit and only increased indexing cost—test both approaches.

Choose retrieval-optimized embeddings (Nomic, Intfloat E5 variants) over domain-specialized models. Weaviate’s research confirms embedding model choice matters as much as chunking strategy, and retrieval-optimized models substantially outperform alternatives. Monitor retriever similarity scores—if they’re low, embeddings may be mismatched or chunking strategy needs adjustment.

Common pitfalls include insufficient RAM (8 GB minimum required, complex workloads need more), forgetting to start Docling Serve (must run on port 5001 before Docker deployment), chunking too small or too large (loses context or reduces precision), selecting wrong embedding models (domain-specialized underperform), and ignoring document structure (naive parsers destroy tables and lists—Docling preserves these). Additionally, Docker VM resource allocation matters—ensure Docker has sufficient RAM and CPU, not just the host system. Port conflicts on 5001 (Docling), 9200 (OpenSearch), or 7860 (Langflow) will block startup.

Related: Spine Swarm Tutorial: Multi-Agent AI on Visual Canvas

Key Takeaways

OpenRAG delivers on the “15 minutes from zero to agentic search” promise through single-command Docker Compose deployment, eliminating the days or weeks traditional RAG implementations require
The platform combines Langflow (visual workflow orchestration), Docling (structure-preserving document parsing), and OpenSearch (enterprise-scale semantic search) into a pre-configured stack with full transparency and control
Ideal for self-hosted requirements, compliance needs, or rapid prototyping to production—choose managed services (Pinecone) for zero-ops SaaS or lighter tools (Chroma) for small-scale prototyping instead
Start with recursive chunking at 250 tokens, use retrieval-optimized embeddings (Nomic, Intfloat E5), and leverage Langflow’s visual editor to customize workflows without code changes
Avoid common pitfalls: ensure 8 GB RAM minimum, verify Docling Serve is running on port 5001, allocate sufficient Docker VM resources, and monitor retriever similarity scores to optimize accuracy

After installation, upload test documents to validate parsing quality, query via the chat interface to assess retrieval accuracy, use Langflow’s visual editor to experiment with chunking and embedding strategies, and monitor retrieval scores to identify optimization opportunities. OpenRAG provides the production-ready foundation—customization and tuning happen iteratively based on your specific document types and query patterns.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.