NewsAI & DevelopmentHardwareInfrastructure

Nvidia’s $81.6B Quarter: What Blackwell Costs Developers

Nvidia Blackwell Ultra GPU chip with blue data streams and inference cost visualization
Nvidia Q1 FY2027: Blackwell Ultra changes the inference cost equation for developers

Nvidia just reported $81.6 billion in quarterly revenue. The financial press is busy with the stock price. You should be looking at a different number: $75.2 billion in data center revenue, up 92% year-over-year, driven almost entirely by Blackwell Ultra deployments. That shift has a direct consequence for every team running inference workloads — the cost-per-token math on H100s has quietly become indefensible, and B300 cloud instances are available right now at prices that make migration worth modeling.

The Data Center Numbers That Actually Matter

Of Nvidia’s $81.6B Q1 FY2027 revenue, $75.2B came from data centers — 92% of total, up 21% sequentially. Networking alone hit $14.8 billion, up 199% year-over-year, which tells you these are full-cluster buildouts, not single-node experiments. Nvidia also introduced a new reporting segment this quarter: ACIE (AI Clouds, Industrial & Enterprise) came in at $37 billion, up 31% quarter-over-quarter. Hyperscale hit $38 billion.

Q2 FY2027 guidance landed at $91 billion ± 2% — $4.2 billion above Wall Street consensus. Management is telling you supply will keep ramping. No GPU drought. Prices should be stable-to-declining through Q3. This is the opposite of what happened in 2023.

The Performance Case for Blackwell Ultra

Here is the core number: Nvidia’s InferenceMAX benchmarks show Blackwell Ultra (B300) delivers 35x lower cost per token versus Hopper (H100) for agentic AI workloads. On Llama 3.3 70B, the B300 hits over 10,000 tokens per second per GPU. On DeepSeek-R1, it runs roughly 5x the throughput of a Hopper system. The overall range across LLM workloads is 11 to 15x throughput improvement per GPU over H100.

The energy story is just as stark: 25x lower energy per inference versus H100. If you are paying for datacenter power directly, that matters. If you are on cloud, it translates to lower per-token pricing as providers pass efficiency gains through competition.

What B300 Actually Costs to Run Today

Cloud pricing as of this week, per GPU per hour:

  • Spheron (spot): $2.45/hr
  • CoreWeave (reserved): ~$3.40/hr
  • Lambda Labs: $6.69/hr (8-GPU configs)
  • CoreWeave (on-demand): $4.50–$5.80/hr

H100 spot runs $2.50–$3.50/hr. The hourly rates look comparable at first glance — but that is the wrong comparison. At equal hourly cost, B300 gives 11–15x the inference throughput. For teams serving LLM requests at scale, the cost-per-token on B300 is dramatically lower even before accounting for Dynamo optimizations. See the live B300 pricing comparison for a current multi-provider breakdown.

Dynamo 1.0: The Free 7x Multiplier

Nvidia shipped Dynamo 1.0 in March 2026 as open source — an inference operating system for Blackwell that boosts performance by up to 7x on top of what the hardware already delivers. It works by routing requests to GPUs that already hold relevant KV cache context (KVBM), moving data between GPUs and lower-cost storage (NIXL), and simplifying multi-node scaling (Grove).

Dynamo integrates with vLLM, SGLang, LangChain, TRT-LLM, and LMCache. If you are already running vLLM, adding Dynamo is not a rewrite — it drops in as an orchestration layer. The code is at github.com/ai-dynamo/dynamo. Running Blackwell without Dynamo is leaving the most significant free performance boost available to you right now on the table.

One Caveat: Watch the Contract Length

Nvidia confirmed Vera Rubin — the next-generation architecture — is on track for production shipments in H2 FY2027, meaning late 2026. Signing a 12-month committed B300 contract today puts you on hardware through the Vera Rubin launch window. That is worth acknowledging. The prudent approach is spot pricing or short reserved contracts of 3–6 months while the architecture transition plays out. Vera Rubin will almost certainly deliver another cost-per-token step change, and you want optionality when it arrives.

What to Do

Three concrete steps worth taking this week:

  1. Benchmark your workload on B300 spot. Spheron and CoreWeave both have on-demand B300 capacity. Run your actual inference workload — not a synthetic benchmark. The throughput difference is real, but your cost model depends on your specific prompt/completion ratio and batch size.
  2. Evaluate Dynamo 1.0 if you are already on Blackwell hardware. The vLLM integration is the easiest entry point. Check the Nvidia Dynamo developer page for supported configurations.
  3. Cap B300 reserved contract length at 3–6 months until Vera Rubin timelines firm up. Keep maximum optionality heading into H2 2026.

The Q1 results confirm what the supply chain data has been suggesting since March: Blackwell Ultra is fully deployed at scale, available to developers today, and the cost-per-token advantage over H100 is not marginal. For teams not locked into existing H100 contracts, the migration math is compelling. The full Q1 FY2027 earnings release is worth reading if you want the complete revenue breakdown before your next infrastructure planning cycle.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News