On May 19, 2026, Google Cloud’s automated systems incorrectly suspended Railway’s production account — and for nearly ten hours, thousands of developer applications went dark. The strange part: workloads running on AWS and Railway’s own bare-metal servers were physically healthy the entire time. They failed anyway. Multi-cloud didn’t save them.
Railway serves nearly 2 million developers and processes 10 million deployments per month. When a PaaS platform that markets itself as multi-cloud resilient gets taken down by a single cloud provider’s automated error, the real question isn’t “why did GCP do this?” — it’s “how did a GCP error kill workloads on a completely different cloud?”
The Railway GCP Outage: An Architecture Problem Nobody Advertises
Railway’s edge proxies use a GCP-hosted control plane API to populate their routing tables — that is, to discover where workloads are running and direct traffic accordingly. When GCP suspended the account at 22:19 UTC on May 19, the control plane API became unreachable. For about 35 minutes, cached routing data kept the proxies working. Then the cache expired. From that point, AWS-hosted workloads and Railway Metal servers that were completely operational started returning 404s, because the proxies had no valid routing information.
The account suspension itself was resolved in seven minutes. GCP restored access at 22:29 UTC, roughly nine minutes after Railway filed an emergency ticket. Full recovery, however, took until 07:58 UTC the next morning — nearly ten hours — because persistent disks, compute instances, and network routing all required separate restoration sequences. GitHub also rate-limited Railway’s OAuth and webhook integrations during recovery, adding another layer of failure on top of an already cascading incident.
Railway Knew This Cloud Vendor Lock-In Risk Was Sitting There
This is not Railway’s first collision with GCP. In 2024, the company explicitly shifted infrastructure away from Google Cloud after GCP “caused a multitude of problems that have posed an existential risk” to their business. Similar issues resurfaced in 2025. Despite that history, Railway maintained an eight-figure annual commitment to Google Cloud and left the control plane dependency in place. Their own February 2026 postmortem had already flagged “tightly coupled systems with a large blast radius” as a recurring risk pattern.
Angelo Saraceno, Railway’s solutions engineer, put it bluntly: “Our customers don’t care if it is Google. We have to own our uptime.” That’s the right framing — and it explains why Railway is accepting full responsibility rather than pointing fingers at GCP’s automated systems. Architectural debt in critical-path infrastructure is different from other tech debt. It doesn’t accumulate quietly. It detonates.
Related: CISA AWS GovCloud Keys Exposed on Public GitHub for 6 Months
What Railway Is Changing
Railway’s incident report commits to four specific architectural changes: removing the GCP control plane dependency from the routing mesh, extending high-availability database shards across AWS and Metal, isolating GCP services to secondary and failover roles only, and redesigning the control and data plane for vendor independence. The company stated plainly: “We take full responsibility for the architectural decisions that allowed a single upstream provider action to cascade into a platform-wide outage.”
That’s the right outcome from a wrong situation. The interesting part is what this reveals about the gap between “multi-cloud” as a marketing label and multi-cloud as an actual architectural guarantee. Spreading workloads across multiple providers means nothing if the system that routes traffic to those workloads lives in a single provider’s account.
What Every Developer Should Check After This
If you use any PaaS, the question this incident asks is simple: where does your routing or service discovery live? If the answer is “one cloud provider’s account,” a suspension, billing error, or regional outage there can kill workloads running everywhere else. According to Flexera’s 2026 State of the Cloud report, 89% of enterprise organizations use multi-cloud — but multi-cloud workloads without a multi-cloud control plane is theater. Tools like Kubernetes and Crossplane exist specifically to decouple control planes from single providers. Railway is learning that lesson the hard way. You don’t have to.
Key Takeaways
- Google Cloud incorrectly suspended Railway’s account on May 19, 2026, triggering a 10-hour outage — despite the account being restored within 7 minutes of the emergency ticket
- Workloads on AWS and Railway Metal failed not because those servers went down, but because the routing mesh depended on a GCP-hosted control plane and cached routes expired after 35 minutes
- “Multi-cloud” does not mean “resilient” if the control plane — routing, service discovery, configuration — sits in a single provider’s account
- Railway had flagged this architectural risk in its February 2026 postmortem; the incident shows that known critical-path debt needs urgent treatment
- Railway’s fix is the right one: decoupling the control plane from GCP, extending HA shards across providers, and demoting GCP to failover-only













