Technology

PostgreSQL Performance Halved on Linux 7.0—The Fix

An AWS engineer discovered this week that Linux 7.0 cuts PostgreSQL database throughput in half on production servers. Testing on a 96-core Graviton4 system revealed PostgreSQL performance dropped to 0.51x baseline, with 55% of CPU time burning in a single spinlock. The culprit: Linux 7.0 removed the PREEMPT_NONE kernel mode, breaking PostgreSQL’s 20-year assumption that lock holders won’t be preempted mid-operation.

This isn’t just a performance regression. It’s a production crisis for database teams considering Linux 7.0 upgrades—and a case study in what happens when kernel modernization breaks application assumptions without providing a migration path.

The Technical Root Cause

Linux 7.0 removed PREEMPT_NONE (non-preemptive kernel mode) and forced modern architectures—arm64, x86, powerpc, riscv, s390—to use PREEMPT_LAZY instead. PostgreSQL’s user-space spinlocks were designed assuming lock holders would run to completion without interruption. That assumption no longer holds.

Under PREEMPT_LAZY, the kernel can preempt threads mid-lock. When Thread A holding a spinlock gets preempted, Threads B through Z (up to 95 cores on a 96-vCPU system) spin in a loop burning CPU cycles while waiting. The AWS report showed 55% of total CPU time spent in the StrategyGetBuffer spinlock alone—a pathological contention scenario caused by aggressive preemption.

The Immediate Fix: Enable Huge Pages

Here’s the good news: enabling huge pages restores PostgreSQL performance on Linux 7.0. Multiple sources confirm the regression largely disappears with huge pages configured. Huge pages provide a double benefit—they eliminate the Linux 7.0 regression AND reduce CPU usage from 51% to 15% in benchmarks by cutting TLB misses and page table walks.

Configuration is straightforward:

# /etc/postgresql/XX/main/postgresql.conf
huge_pages = on
shared_buffers = 32GB

# Disable Transparent Huge Pages (causes fragmentation)
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Critical note: use explicit huge pages, not Transparent Huge Pages (THP). THP causes memory fragmentation and unpredictable performance for databases.

The Kernel Developers’ Response: Use RSEQ

Intel kernel engineer Peter Zijlstra, who authored the PREEMPT_NONE removal, says the “proper fix” is for PostgreSQL to adopt RSEQ (Restartable Sequences). RSEQ is a kernel facility that lets user-space threads request temporary time slice extensions during critical sections, preventing lock holder preemption.

RSEQ was a decade in the making and finally merged in Linux 7.0. Ironically, the kernel that broke PostgreSQL also ships the fix—but requires application code changes to use it. The PostgreSQL community’s response to rewriting code for performance they had for free? Not enthusiastic.

Who Should Adapt When Kernels Break Assumptions?

This regression exposes a fundamental tension. Kernel developers argue applications shouldn’t rely on specific preemption behavior—they should use kernel-aware primitives like RSEQ. Application developers counter that breaking 20-year-old assumptions without a migration path is poor stewardship.

The pragmatic reality: huge pages solve this today. RSEQ might be the “correct” long-term solution, but production databases can’t wait months or years for PostgreSQL to adopt it. DBAs need fixes this week, not architectural debates about who’s responsible.

What Database Teams Should Do Now

If you’re running PostgreSQL and considering Linux 7.0, enable huge pages immediately before upgrading. Test performance on staging first—don’t discover a 50% throughput drop in production.

If you’re already on Linux 7.0 and seeing performance degradation, huge pages will restore throughput. The configuration takes minutes. If you can’t use huge pages (containerized environments, dynamic scaling, small databases), stay on Linux 6.x until PostgreSQL implements RSEQ support.

Key Takeaways

  • Linux 7.0 cuts PostgreSQL throughput in half by removing PREEMPT_NONE and breaking spinlock assumptions
  • Huge pages restore performance completely and provide additional CPU usage benefits (51% → 15%)
  • Kernel developers recommend RSEQ as the long-term fix, but PostgreSQL hasn’t adopted it yet
  • Production databases need solutions today—enable huge pages now, don’t wait for “proper” fixes
ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:Technology