Temporal Serverless Workers: Deploy AI Agents on AWS Lambda

Temporal serverless workers connected to AWS Lambda with durable execution workflow diagram on dark blue background

Temporal Serverless Workers: Deploy durable AI agents on AWS Lambda without managing always-on server processes

Temporal just eliminated the biggest excuse for avoiding durable execution. Serverless Workers — announced at Replay 2026 and now in pre-release — let you deploy Temporal Workers to AWS Lambda and hand off every infrastructure concern to Temporal Cloud: invocation, autoscaling, and scale-to-zero. No servers running idle between jobs. No capacity planning. Just crash-proof workflows that only cost compute when work is actually happening.

The Problem With Always-On Workers

Temporal is compelling on paper: deterministic workflow orchestration with automatic crash recovery, retry logic, and full event history. But production adoption has always required always-on Worker processes — long-lived servers or containers that continuously poll task queues, whether or not there is any work to do. For teams running sporadic or bursty workloads, that meant paying for idle compute around the clock. For smaller teams, it meant learning Kubernetes or ECS before they could even ship a workflow.

Serverless Workers change that calculus entirely.

How Serverless Workers Work

Instead of a process that polls continuously, a Serverless Worker is a Lambda function that Temporal Cloud invokes directly when Tasks arrive. When the task queue is empty, nothing runs. When a burst of tasks arrives, Lambda scales to match demand. When each function invocation finishes, it shuts down. Temporal handles all the coordination.

Setup is three steps:

Package your Worker code and deploy it to AWS Lambda
Create a cross-account IAM role using the Temporal-provided CloudFormation template
Register the Lambda function with Temporal Cloud via CLI or the UI

Worker Deployment Versioning is automatically enabled for Serverless Workers — you must include a WorkerDeploymentVersion identifying your deployment and build ID. This is not optional, and it is worth understanding: it is what lets Temporal safely route tasks to the correct version of your code during deployments.

Python: The Lambda Handler Pattern

The lambda_worker contrib package handles the Lambda lifecycle, Temporal client setup, and graceful shutdown. You register your Workflows and Activities exactly as you would with a standard Worker. Full setup is in the Python SDK documentation:

from temporalio.worker.lambda_worker import run_worker, WorkerDeploymentVersion

handler = run_worker(
    WorkerDeploymentVersion(
        deployment_name="my-agent",
        build_id="1.0.0",
    ),
    configure=lambda opts: opts.update(
        workflows=[AgentWorkflow],
        activities=[call_llm, run_tool, search_database],
    ),
)

def lambda_handler(event, context):
    return handler(event, context)

The Go SDK uses a similar pattern via lambdaworker.RunWorker(). Both apply conservative defaults tuned for Lambda’s short-lived invocation model rather than the always-on Worker defaults.

Why AI Agents Benefit Most

Temporal’s durable execution model maps almost perfectly onto how AI agents work. LLM calls, tool executions, web searches, database writes — these are all non-deterministic operations with unpredictable latency and failure modes. Run them as Temporal Activities and you get automatic retries, configurable timeouts, and crash recovery for free. The workflow orchestrating those steps is deterministic and replayable from any point in its history.

The objection people raise is Lambda cold starts. AWS Lambda takes 200–500ms to initialize a new execution environment. That sounds meaningful until you remember that a single LLM call takes one to thirty seconds. The cold start is noise. For multi-step agent workflows running for minutes, it is irrelevant.

Serverless Workers are a particularly good fit for agentic workloads because agent invocations tend to be bursty and event-driven — a user submits a request, an agent processes it asynchronously, results are returned. Traffic is not constant. With always-on Workers, you were sizing infrastructure for peak load even when those servers were idle between requests.

Workflow Streams: Streaming Tokens Through Temporal

Also released at Replay 2026 and now in Public Preview: Workflow Streams. This feature uses Temporal’s Signal and Update primitives to push token batches and status updates back to the caller while a workflow is running — enabling live streaming AI responses through a durable workflow, rather than forcing a choice between streaming output and durable execution. Previously, developers had to pick one or work around the limitation.

When to Stick With Traditional Workers

Serverless Workers are not the right choice for every workload. If your task queues receive high, consistent throughput — think thousands of tasks per minute around the clock — long-lived Workers on dedicated compute will be more cost-effective. Latency-sensitive workflows where even 200ms matters should also stay on traditional Workers. Teams already running Temporal on Kubernetes with mature operational tooling have little reason to switch.

The serverless model shines for: AI agents with sporadic invocations, background processing pipelines with unpredictable load, teams without Kubernetes expertise, and greenfield Temporal projects that want zero infrastructure to start.

The Infrastructure Excuse Is Gone

Temporal’s $300M raise in February 2026 at a $5B valuation was a signal. Mistral building its enterprise Workflows platform on Temporal — running millions of daily executions — was a signal. OpenAI running production model training pipelines on Temporal was a signal. The platform is mature, and the ecosystem is growing around it.

What held back adoption at smaller scales was the operational overhead of always-on Workers. Serverless Workers remove that blocker. For AI agent builders shipping on AWS, the combination of Temporal’s durable execution guarantees and Lambda’s zero-infrastructure operational model is now available in pre-release. The Python and Go SDKs are ready, and the IAM setup takes twenty minutes.

Start building before the GA waitlist opens.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.