LangSmith Engine and SmithDB: Fix Agent Failures Fast

Abstract visualization of AI agent trace trees with automated debugging loop, ByteIota tech blog

LangSmith Engine auto-detects production agent failures and opens PRs with fixes. SmithDB delivers 15x faster trace loading.

LangChain shipped two infrastructure pieces at Interrupt 2026 that deserve more attention than they got. SmithDB replaces the database underneath LangSmith with purpose-built storage that loads trace trees at 92ms P50 up to 15x faster than before. LangSmith Engine sits on top and automates what developers have been doing manually for years: reading production traces, spotting patterns, diagnosing root causes, and writing fixes. Both are live now. If you run LangGraph or LangChain agents in production, this changes your debugging workflow.

LangSmith Engine: The Debug Loop, Automated

Here is how agent debugging has worked until now: an agent fails in production, you dig through hundreds of traces, you eventually spot a pattern, you write a fix, you write a regression test, and you hope the same thing does not surface again in two weeks under a slightly different input. It is tedious, it is manual, and it does not scale.

LangSmith Engine replaces that cycle. Connect a tracing project and optionally link your GitHub repository, and Engine begins monitoring your production traces continuously. It clusters failures into named, prioritized issues: tool call failures, timeouts, evaluator failures, latency spikes, token blowouts, unexpected step counts, and negative user feedback. When it spots a recurring issue, it reads your source code, diagnoses the root cause, and opens a PR with a targeted fix. It also proposes a custom online evaluator to catch regressions and pulls the failing traces into your offline eval dataset.

The closed loop matters. If you close an issue and the same failure resurfaces, Engine automatically reopens it. The fix you shipped did not stick, and now you know. This is the part the manual approach gets wrong: you patch something, move on, and never notice when it quietly comes back.

LangSmith Engine is in public beta now. Getting started is two steps: connect your tracing project and optionally connect your repo. Engine surfaces issues automatically from there.

SmithDB: Why Agent Traces Needed a New Database

The reason LangSmith needed a new database is not about scale in the traditional sense. It is about shape. A typical agent trace can contain tens of thousands of intermediate spans with large, unbounded payloads: full conversation histories, multi-modal data, deeply nested JSON. Standard databases were designed for rows and columns, not recursive trees with unpredictable depth and payload size.

LangChain built SmithDB in Rust using Apache DataFusion as the query engine and Vortex for the file layer. The architecture is three stateless services: ingestion, query, and compaction, sitting on top of object storage and a small Postgres metastore. No local disks. Scaling means adding compute, not resharding a cluster.

The performance numbers are meaningful for anyone who has watched LangSmith struggle on a large project:

Trace tree loads: 92ms at P50
Single run loads: 71ms at P50
Run filtering: 82ms at P50
Full-text search: 400ms at P50

That is up to 15x faster than the previous system on core LangSmith workloads. SmithDB is live for all US cloud customers with no action required. Teams that need data residency can self-host inside a VPC using the provided Terraform samples, three stateless components against S3 and RDS Postgres, no complex sharding required.

Fleet Sandbox, Context Hub, and LLM Gateway

Three other Interrupt 2026 releases are worth flagging. Fleet Sandbox in public beta gives Fleet agents a secure execution environment where they can write and run code, not just call tools. Agents can now analyze data, transform files, generate PDFs, run shell commands, and install dependencies. The sandbox supports snapshots, copy-on-write forks, and auto-pause when idle to control costs.

Context Hub versions the instructions and policies your agents follow, so you can see exactly which prompt was active during any given trace. LLM Gateway enforces spend limits and redacts PII before requests leave your environment, the compliance layer most enterprise teams have been waiting for.

What Changes Now

The biggest shift is not the performance improvement. It is the gap Engine closes between knowing something is wrong in production and having a fix ready to review. LangChain’s State of Agent Engineering report from April 2026 found that 57 percent of organizations now have agents in production and quality is still the top blocker, cited by 32 percent of teams. Engine is a direct response to that number.

For teams already on LangSmith, SmithDB is already there, no migration required. LangSmith Engine is available in public beta: connect your project, optionally connect GitHub, and review the issues it surfaces. Self-hosted teams can find the SmithDB Terraform configs on the LangChain GitHub. The observability gap between knowing something is broken and having a fix ready to review is closing fast.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.