Mintlify Ditches RAG for Filesystem: 460x Faster

Mintlify just replaced RAG with a virtual filesystem for their AI documentation assistant. The results: session creation dropped from 46 seconds to 100 milliseconds—460 times faster. Marginal cost per conversation went from $0.0137 to zero. The system now powers 30,000+ daily conversations on infrastructure they already had. The insight driving this: agents don’t need semantic vector search for structured documentation. They need grep, cat, ls, and find.

The RAG Problem No One Wants to Admit

RAG has limitations that scale poorly with structured content. Mintlify’s assistant could only retrieve text chunks matching queries. When answers spanned multiple documentation pages, the system failed. When users needed exact syntax that didn’t land in the top-K vector similarity results, it missed entirely.

The sandbox approach was equally problematic. Spinning up isolated containers with repository cloning took 46 seconds at P90. At 850,000 monthly conversations, minimal infrastructure—1 vCPU, 2 GiB RAM, 5-minute sessions—would cost roughly $70,000 annually. Every conversation carried a $0.0137 marginal cost.

The chunking problem runs deeper than Mintlify. Vector databases split documents into 100-200 character fragments for embedding. Context gets shredded. Imagine reading a book where someone shuffled the pages. Multi-step logic chains break. If documentation explains that feature A connects to component B, and component B integrates with service C, vector similarity search often can’t follow the A→B→C relationship. The retrieval process is a black box—when it misses an answer, debugging why becomes guesswork.

Why Filesystems Win for Agents

LLMs spent their training digesting massive amounts of code from GitHub. They’ve logged countless hours navigating directory structures, grepping through files, and managing state across complex codebases. Filesystem operations aren’t a new interface—they’re a native language.

Agents are converging on filesystems because grep, cat, ls, and find provide exactly what’s needed: exact string matching instead of fuzzy vector proximity, structure traversal instead of blind retrieval, and command composition through pipes instead of single-shot queries. If agents excel at filesystem operations for code, they excel at filesystem operations for anything.

The precision difference matters. Vector search excels at recall—finding conceptually related content even with different terminology. But it struggles with precision. When a developer needs the exact authentication header format from OAuth documentation, grep finds it. Vector search returns five somewhat-related chunks about authentication concepts.

ChromaFs: UNIX Commands Meet Vector Databases

Mintlify built ChromaFs on just-bash, Vercel’s TypeScript bash implementation. The system intercepts UNIX commands and translates them into Chroma database queries. No real files. No containers. No sandboxes. Just a virtual filesystem that speaks grep.

The architecture has four key components. Directory bootstrapping stores file trees as gzipped JSON containing metadata about paths, permissions, and hierarchies. This loads into memory on initialization—a Set for paths, a Map for directory contents. Operations like ls, cd, and find run in memory with zero network overhead.

File retrieval handles Chroma’s chunking. When an agent runs cat /auth/oauth.mdx, ChromaFs fetches all chunks matching that page slug, sorts by chunk index, and concatenates them into the full page. Results cache to avoid redundant database hits when grep workflows read the same file repeatedly.

Search optimization uses two-stage filtering. Chroma performs coarse filtering: “which files might contain this string?” Results bulk-prefetch into Redis. The rewritten grep command then executes fine-grained in-memory regex against cached candidates. Large recursive searches complete in milliseconds.

Access control filters path trees before construction. Unauthorized paths get pruned entirely—they’re invisible to the agent. Simpler than managing Linux permissions across container instances. Every write operation returns an EROFS error. Read-only design ensures statelessness. No session cleanup. No corruption risk.

The Numbers Tell the Story

Session creation: 46 seconds to 100 milliseconds. That’s not incremental improvement—it’s a category shift. The $70,000 annual sandbox infrastructure cost dropped to zero marginal cost per conversation. Mintlify reused their existing Chroma database. No new infrastructure. No added compute.

The system handles 30,000+ conversations daily. Hacker News users validated the approach with 228 upvotes and 98 comments. This isn’t a proof of concept. It’s production infrastructure serving real users at scale.

When to Use Filesystems vs RAG

Filesystems win for small corpora with clear structure. Documentation, codebases, API references—anything with hierarchical organization and exact lookup requirements. Grep excels when you need that specific error code or function signature.

RAG wins for massive knowledge bases. Millions of documents. Unstructured content. Semantic similarity across languages, synonyms, and concepts. Scales to billions of embeddings when semantic relationships matter more than exact matches.

The real answer is both. Route different query types to the right retrieval method. Use filesystem search for structured documentation. Use vector search for conceptual questions. Mintlify could add RAG back for fuzzy searches while keeping ChromaFs for precise lookups. Hybrid architectures beat purist approaches.

What This Means for Your Systems

Most documentation AI tools default to RAG because that’s what the tutorials teach. Mintlify’s results suggest that assumption needs challenging. If your content has structure, if users need exact matches, if your agents spend time exploring hierarchies—consider filesystems.

The broader trend matters more. Agents are converging on filesystem interfaces because that’s what LLMs already understand. just-bash from Vercel, ChromaFs from Mintlify, and the emerging “Standard RAG is Dead” narrative all point the same direction. The next wave of agent tooling won’t be pure vector search. It’ll be systems that speak grep.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.