NewsProgramming Languages

Rust’s hyper Has a Silent Bug Dropping Your Data

Rust gear logo with broken data pipe showing missing bytes, representing the hyper HTTP library race condition bug discovered by Cloudflare

Cloudflare engineers spent six weeks hunting a bug that left no fingerprints. Requests returned HTTP 200. Headers were correct. No errors anywhere in the logs. But large responses kept arriving truncated at the client. The culprit turned out to be hyper — the HTTP library that underpins axum, actix-web, warp, and most of the Rust web ecosystem. The hyper HTTP library bug is a race condition in hyper’s HTTP/1 dispatch loop that has existed across versions 0.14 through 1.8. It causes servers to shut down connections before flushing buffered output. It took kernel-level strace tracing to find it. The fix is four lines.

What Broke, and Why Nobody Noticed

The failure surfaced after Cloudflare rearchitected its Images binding in late 2025, replacing a higher-latency intermediary with a Unix socket for performance. Shortly after rollout, a customer reported that image transformation requests were silently returning incomplete data — correct HTTP 200 status, correct Content-Length header, wrong number of bytes in the body. A 14.9 MB response was arriving as roughly 219 KB. No error. No timeout. Just missing data.

The reason nobody caught it sooner is that the bug only triggers under specific conditions: the response body must be large enough to fill the socket’s outbound buffer, and the consumer must be reading slowly enough to let that buffer fill. Cloudflare’s old intermediary consumed data fast enough that the buffer rarely filled. The new Unix socket intermediary was just slow enough to trigger it consistently. Moreover, when engineers tested with curl, nothing went wrong — curl reads data as fast as it arrives. The same was true on macOS and Debian VMs. The bug lived exclusively in the production configuration.

Six Weeks to Find a Four-Character Discard

The Cloudflare team worked through every layer: application tracing, distributed tracing, HTTP client tracing, version rollbacks. Nothing surfaced the root cause. However, what finally cracked it was dropping to the kernel with strace. The syscall trace was immediately revealing:

sendto(42, "HTTP/1.1 200 OK
Content-Length: 14991808
...", ...) = 219264
shutdown(42, SHUT_WR) = 0

219 KB transmitted. Immediate shutdown. 14.9 MB still buffered. The kernel doesn’t lie.

The root cause in hyper’s dispatch loop was a single discarded return value:

let _ = self.poll_flush(cx)?;

The let _ = idiom explicitly discards the result of poll_flush(). In async Rust, a flush operation returns Poll::Pending when the outbound buffer is full and can’t accept more data yet. By discarding that result, hyper’s dispatch loop can proceed to connection shutdown while megabytes of response data remain buffered. The ? at the end propagates errors — but Poll::Pending is not an error. It means “not done yet.” And the code was throwing that signal away. Specifically, this pattern existed across hyper 0.14, 1.7, and 1.8 — years of production use.

The Fix and What to Do Now

The patch gates shutdown on a completed flush:

pub(crate) fn poll_shutdown(
    &mut self, cx: &mut Context<'>,
) -> Poll<io::Result<()>> {
    ready!(self.poll_flush(cx)?);
    Pin::new(&mut self.io).poll_shutdown(cx)
}

Cloudflare also built a deterministic test case using a custom TCP stream wrapper that simulates a constrained socket — accepting 8 KB on the first write, then returning Poll::Pending. The fix was merged as PR #4018 to hyperium/hyper master. It is not yet in an official release. Cloudflare is currently running an internal fork with the patch applied.

Axum, actix-web, and warp all sit on top of hyper. If you are running a Rust web service that sends large response bodies — file downloads, image processing, bulk API responses — and your consumer reads from the socket more slowly than you write, you may be silently truncating data right now. The client gets HTTP 200. Your logs show nothing. The data is simply gone. If you are also building on the Rust ecosystem, ByteIota recently covered Toasty, the new async Rust ORM for Tokio, which has similar async state machine considerations.

Key Takeaways

  • Affected versions: hyper 0.14 through 1.8 — upgrade to a version containing PR #4018 once released
  • Test your stack: Send large responses through your service with a throttled consumer and verify the body arrives complete
  • Application tracing has limits: When it fails, drop to the kernel — strace, perf, and eBPF exist for this
  • Rust safety ≠ async correctness: Memory safety guarantees don’t cover Poll::Pending being discarded in a state machine

As the Cloudflare Engineering Blog puts it: “Application-level observability can have a blind spot for bugs that live below its awareness.” The Hacker News discussion has the usual debate about whether this reflects a design flaw in async Rust’s Poll model, but that misses the point. The engineering work here is what matters: six weeks of disciplined debugging, a deterministic test case, a clean upstream fix. That is how you fix a library that ships inside half the Rust web ecosystem.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News