pg_durable: Microsoft Open-Sources Durable Execution for PostgreSQL

PostgreSQL elephant with blue circuit patterns symbolizing pg_durable durable execution workflows

pg_durable brings fault-tolerant SQL workflow execution directly into PostgreSQL

Microsoft open-sourced pg_durable today — a PostgreSQL extension that brings durable execution directly into your database. No Temporal cluster, no Redis, no external orchestration service. You write SQL workflows, PostgreSQL checkpoints each step, and if the server crashes mid-pipeline, execution resumes exactly where it left off. It hit 265 points on Hacker News within hours of release, and the reaction split cleanly: “this is exactly what I’ve been building by hand” versus “why are you putting compute in the database.”

What Durable Execution Actually Means

Most multi-step workflows have a reliability problem. Fetch 500 documents, call an embedding API, upsert results into pgvector. If the API call fails on document 312, what happens? You either re-run everything from scratch — duplicate work, wasted money, possible data corruption — or you build checkpoint tables, retry logic, and state machines by hand. Durable execution solves this at the infrastructure level: each step checkpoints its result, and failure resumes from the last successful checkpoint rather than the beginning.

Temporal pioneered this pattern for the broader ecosystem. AWS Step Functions, Azure Durable Functions, and DBOS followed. pg_durable is Microsoft’s answer for teams that already live in PostgreSQL and want the same guarantee without spinning up a separate cluster.

How pg_durable Works

The extension runs as a PostgreSQL background worker — no external process, no sidecar, nothing beyond the extension itself. It is built on two Rust libraries: duroxide, the orchestration runtime that handles deterministic replay and checkpointing, and duroxide-pg, which persists workflow state in a dedicated duroxide.* schema in your database.

Workflows are composed using a SQL DSL with two operators: ~> chains steps sequentially, and |=> binds a step’s output to a named variable. A basic embedding pipeline looks like this:

SELECT df.start(
    'SELECT id, content FROM documents WHERE embedded = false LIMIT 500'
        |=> 'batch'
    ~> 'SELECT call_embedding_api($batch)'
        |=> 'embeddings'
    ~> 'INSERT INTO document_embeddings
        SELECT unnest FROM unnest($embeddings::vector[])
        ON CONFLICT DO UPDATE SET embedding = EXCLUDED.embedding'
    ~> 'UPDATE documents SET embedded = true
        WHERE id IN (SELECT id FROM jsonb_array_elements($batch))'
);

If the embedding API call fails, pg_durable replays the workflow from the last committed checkpoint rather than from step one. Control flow primitives — df.if(), df.join() for parallel execution, and df.loop() — handle branching and fan-out.

The Differentiator That Actually Matters

The unified backup story is pg_durable’s clearest advantage over external orchestrators. When you run pg_basebackup or use your cloud provider’s point-in-time recovery, your workflow state and your business data back up atomically. With Temporal or Step Functions, you have two separate backup processes to coordinate — and after a restore, your workflow engine and your database can end up in different states. pg_durable eliminates this problem by making them the same thing.

For AI pipelines that mix SQL reads, external API calls, and vector upserts — all within the same Postgres instance — this is genuinely compelling.

What to Use It For (and What to Skip)

pg_durable is well-suited for workflows that are mostly SQL operations with some external API calls: vector embedding pipelines, batch ETL with deduplication and transformation, maintenance jobs with approval gates, and fan-out aggregations. These map cleanly to its SQL DSL and benefit from the unified state model.

It is not the right tool when your workflow coordinates across multiple services with their own databases. It is also not right when you need language-specific SDKs, sophisticated observability, or workflows better expressed as Python or TypeScript code. The DX gaps are real: no Web UI, no built-in metrics dashboard, and the testing story is underdeveloped. The Hacker News commenters who flagged “the multi-master scaling story isn’t turnkey” are correct — at least for now.

For Postgres-backed durable execution with code-native workflows, DBOS is the stronger alternative: workflows are regular Python or TypeScript with decorators, live in Git, and behave like normal application code. pg_durable makes more sense when the entire pipeline is already in SQL and you want fault tolerance without touching application code.

How to Get Started

Tagged releases ship Debian packages for PostgreSQL 17 and 18. Install the package, add pg_durable to shared_preload_libraries, restart PostgreSQL, and run CREATE EXTENSION pg_durable. The HN thread is worth reading before you commit — the Microsoft team is actively responding to technical questions and the community has surfaced the real constraints.

It is in preview, so the API will change. If you are running AI workflows that are mostly Postgres writes with external API calls and you want to stop maintaining checkpoint tables by hand, pg_durable is worth evaluating. If you are coordinating across services or need workflows your team can reason about in code, start with DBOS or Temporal and revisit pg_durable when its observability story matures.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.