PostgreSQL 50% Slower on Linux 7.0 – AWS Alert

An AWS engineer reported yesterday that PostgreSQL database throughput drops by 50% on Linux kernel 7.0, with production workloads running at approximately half the speed of prior kernels on Graviton4 servers. The regression stems from Linux 7.0’s removal of the PREEMPT_NONE scheduler mode for modern CPU architectures, causing PostgreSQL to spend excessive time waiting in user-space spinlocks. With Linux 7.0 stable release expected within two weeks, millions of PostgreSQL deployments face an infrastructure crisis: doubled costs or halved database capacity overnight.

50% Throughput Drop on Graviton4 Servers

Salvatore Dipietro of Amazon/AWS discovered the regression through kernel bisection, isolating the exact commit responsible for PostgreSQL’s performance collapse. The measurements are stark: Linux 7.0 delivers only 0.51x the throughput of prior kernel versions on AWS Graviton4 ARM-based processors. That’s not a minor optimization gap – it’s a catastrophic production crisis.

The timeline makes this particularly urgent. Dipietro reported the findings on April 3-4, 2026, mere days before Linux 7.0’s expected stable release around April 18. For organizations running PostgreSQL in production, this isn’t theoretical – kernel updates could halve database performance within weeks unless they pin versions or delay upgrades.

PREEMPT_NONE Removed, Spinlocks Suffer

The root cause is Linux 7.0’s architectural decision to remove PREEMPT_NONE as a default option for modern CPU architectures including ARM64, x86, PowerPC, RISC-V, s390, and LoongArch. The kernel now only offers PREEMPT_FULL (always preemptible for low latency) and PREEMPT_LAZY (scheduler-controlled preemption), forcing a shift away from the server-optimized throughput model.

PostgreSQL’s internal synchronization relies on user-space spinlocks – lightweight locks that busy-wait in a loop checking for availability. These spinlocks assumed PREEMPT_NONE’s non-preemptive scheduling, where lock holders wouldn’t be interrupted mid-operation. Under the new PREEMPT_LAZY model, threads holding locks get preempted by the scheduler, causing other threads to waste CPU cycles spinning while waiting for locks that won’t be released until the preempted thread resumes. The result is throughput collapse.

This isn’t a bug to be fixed – it’s an intentional kernel simplification. The developers believe “fundamentally preemptible” kernels are the future, even if existing applications break. That puts the burden of adaptation squarely on PostgreSQL and other database systems, not on the kernel.

Kernel Dev: “Use RSEQ Instead”

Kernel developer Peter Zijlstra’s response to the AWS engineer’s report was blunt: “The fix here is to make PostgreSQL make use of rseq slice extension…That should limit the exposure to lock holder preemption.” In other words, PostgreSQL should adopt Restartable Sequences (RSEQ) – a mechanism for user-space atomic operations – rather than the kernel reverting its architectural changes.

RSEQ allows threads to request time slice extensions to avoid being preempted while holding critical locks. It’s technically elegant and was upstreamed alongside Linux 7.0 as the intended solution. However, there’s a problem: PostgreSQL hasn’t integrated RSEQ support yet, and there’s no announced timeline for when it will. Database administrators facing 50% performance drops don’t have the luxury of waiting months for PostgreSQL to adapt to kernel evolution.

This creates a philosophical standoff in open source infrastructure. Should kernels maintain backwards compatibility for widely-used applications, or should applications constantly adapt to kernel changes? Zijlstra’s position is clear: applications must modernize. But architectural purity is cold comfort when production databases run at half speed.

Doubled Costs or Halved Capacity

For organizations running PostgreSQL on Linux 7.0, the 50% performance drop translates directly into infrastructure economics. Either double your server fleet to maintain the same query throughput, or accept half the capacity and slower response times. A company running 10 PostgreSQL database servers suddenly needs 20 to maintain current performance – doubling EC2 or compute costs overnight.

Cloud providers likely won’t deploy Linux 7.0 to managed database services like AWS RDS PostgreSQL until this issue resolves. AWS’s own testing caught the regression before customer impact, demonstrating why rigorous kernel validation matters for production infrastructure. Nevertheless, self-managed PostgreSQL users who auto-upgrade kernels – or who test with toy workloads instead of realistic production patterns – will hit this regression immediately.

What DBAs Should Do Now

Database administrators should delay upgrading production PostgreSQL systems to Linux 7.0 until PostgreSQL releases RSEQ support or alternative mitigations emerge. This isn’t paranoia – it’s prudent infrastructure management validated by a real-world 50% performance regression discovered by AWS engineers.

Practical steps for production environments:

Pin kernel versions in production – don’t auto-upgrade
Add PostgreSQL performance benchmarks (pgbench, sysbench) to kernel upgrade testing procedures
Monitor kernel version and spinlock wait time metrics on dashboards
Wait for PostgreSQL community announcement of RSEQ support before considering Linux 7.0
If using managed databases (AWS RDS, Google Cloud SQL), expect automatic delays in Linux 7.0 adoption

This incident exposes a dangerous gap in kernel testing methodology. PostgreSQL is one of the world’s most popular databases with millions of production deployments. How did removing PREEMPT_NONE not trigger database workload regression tests before release candidates shipped? The answer: kernel testing focuses on synthetic benchmarks, not real-world application stacks. Consequently, database administrators are left to discover performance cliffs through production incidents or, if they’re lucky, through thorough pre-upgrade testing.

The lesson is clear: never trust “stable” kernel releases for performance-sensitive workloads without extensive validation. Even widely-used open source infrastructure can have massive regressions for specific use cases. Test thoroughly, pin versions conservatively, and don’t assume kernel developers prioritize your workload in their testing matrix.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.