Industry AnalysisInfrastructure

SQLite Durable Workflows: Skip Temporal Until You Need It

Split-screen comparison of SQLite plus Litestream minimal workflow stack versus Temporal cluster infrastructure for AI agent durable execution

Yesterday, a blog post from the Obelisk team hit #2 on Hacker News with 472 points and 240 comments. The argument: for the vast majority of AI agent workloads, you don’t need a dedicated orchestration cluster. SQLite durable workflows paired with Litestream are enough. And if you needed production proof beyond one team’s blog post, Cloudflare just rebuilt Workflows V2 on a SQLite backend — scaling from 4,500 to 50,000 concurrent workflow instances in the process.

Why Durability Is the 2026 AI Agent Problem

Every team building AI agents in 2026 eventually runs into the same wall. Agents aren’t single function calls — they’re chains of steps: call an LLM, search the web, parse results, call another LLM, write to a database, send a notification. Each step carries its own failure rate.

Do the math: five steps each running at 99% reliability gives you a 95% end-to-end success rate. Push that to ten steps and you’re at 90%. In a system processing thousands of agent runs per day, that’s hundreds of failures — each one potentially losing state, double-charging a customer, or leaving a workflow stuck in an unknown half-complete state.

This is the durability problem. Durable execution frameworks solve it by persisting workflow state at every step, so a crash mid-workflow doesn’t lose everything — it resumes from the last checkpoint. AWS, Cloudflare, and Vercel all shipped durable execution products in 2025. The question now isn’t whether you need durability. It’s how much infrastructure you actually need to get it.

SQLite Durable Workflows: The Minimal Stack

The Obelisk team’s core insight is worth stating clearly: the durable part is the workflow state — the compute can stay cheap and disposable. You don’t need a separate database service to store workflow logs. You need transactional local storage backed by continuous replication. That’s exactly what SQLite and Litestream provide.

SQLite runs embedded in your worker process. In WAL mode, readers never block writers and writers never block readers — which maps cleanly onto a workflow log pattern where you’re constantly appending new execution steps while the scheduler reads current state. Litestream runs as a sidecar, continuously streaming WAL pages to S3-compatible storage. If your worker crashes, the recovery path is deterministic: restore the latest snapshot, replay the WAL, and you’re back to the exact workflow state before the failure.

The infrastructure overhead is essentially zero. No Cassandra cluster, no history service, no separate Postgres instance. Just your existing compute plus a cheap S3 bucket. For AI agent workloads — which are bursty, isolated, and typically single-tenant per worker — this is the right fit. One SQLite database per agent run means no competing write pressure, no multi-writer concurrency problem, and nothing to tune.

Cloudflare Didn’t Choose SQLite by Accident

When teams object that SQLite “doesn’t scale,” it’s worth pointing to Cloudflare Workflows V2, which scaled from 4,500 concurrent instances to 50,000 by switching to a SQLite-backed storage model. The world’s largest CDN chose SQLite for per-instance workflow state — not as a prototype, but as the production architecture for a globally distributed system.

Cloudflare’s Durable Objects take this further, giving each tenant or AI agent its own isolated SQLite database. This is the same architectural pattern the Obelisk post describes: isolated state per agent, no shared writers, reliable replication. The HN discussion captures the reaction well — developers reporting they’ve replaced multiple SaaS tools with single services backed by SQLite and watched costs collapse. The pattern works at real scale.

When You Actually Need Temporal

None of this means Temporal is wrong. It means Temporal is expensive to operate and earns that cost at a specific scale. Temporal requires a dedicated cluster — a history service, a matching service, worker processes, and a backend that’s either Cassandra or PostgreSQL. Each activity dispatch adds 10-50ms of latency. The operational surface is real. But if you’re running a multi-tenant platform where workflows coordinate across three or more services with significant fan-out, or if you need multi-region durability guarantees, Temporal’s primitives are worth the overhead. It’s built for complexity SQLite isn’t designed to handle. We covered what that looks like in practice in our post on Temporal Serverless Workers and durable AI on Lambda.

Moreover, the spectrum between SQLite and Temporal is broader than most teams realize. DBOS stores workflow state in your existing Postgres with no new infrastructure — its Go SDK shipped April 2026. Restate runs as a lightweight sidecar for HTTP microservices and needs no separate cluster. Managed options like Inngest and Trigger.dev remove infrastructure concerns entirely for teams that don’t want to operate anything themselves. The right tool depends on workflow complexity, your team’s operational capacity, and whether you’ve actually hit the ceiling of simpler options.

The Obelisk engine itself demonstrates the practical approach: SQLite by default, with PostgreSQL as an upgrade path when you outgrow single-instance deployment. Start with the tool that costs nothing extra, requires no operational expertise, and handles the 90% case. Add complexity only when you’ve proven you need it — not before. For most teams building production AI agent workflows in 2026, that means SQLite first. Temporal when you’ve genuinely hit its ceiling.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *