
Microsoft announced a compact developer workstation at Build 2026 that can run 120B-parameter language models on your desk — the same models that currently cost $15 to $60 per million tokens via cloud APIs. The Surface RTX Spark Dev Box uses NVIDIA’s new RTX Spark superchip, pairing a 20-core Grace CPU with a Blackwell GPU and 128 GB of unified memory to hit 1 petaflop of AI compute. It ships this fall for an estimated $3,000 to $3,500. After a year of escalating API bills, it is the most concrete hardware answer Microsoft has offered developers looking to bring inference in-house.
What the Hardware Actually Delivers
The RTX Spark superchip is manufactured on TSMC’s 3nm process and combines a 20-core NVIDIA Grace CPU with a Blackwell GPU containing 6,144 CUDA cores and 128 GB of LPDDR5X unified memory at 300 GB/s bandwidth. NVIDIA rates the platform at 1 petaflop of FP4 AI compute — enough to run a 120B-parameter model with a 1-million-token context window at interactive speeds.
The practical math: a 120B model at Q4 quantization uses roughly 60 to 70 GB, leaving headroom within 128 GB for the OS and running tools. That means Llama-class 120B models, which previously required a cloud GPU rental, can run locally. Fine-tuning workflows that historically required cloud GPU allocations are also within reach.
Critically, CUDA is fully supported. WSL 2 ships pre-configured with GPU passthrough, which means CUDA-only tooling — vLLM, custom PyTorch kernels, NVIDIA-specific inference optimizations — works out of the box. That is a genuine differentiator that Apple Silicon cannot match.
A Developer Box That Ships Ready
Microsoft did not stop at the hardware. The Dev Box ships with an opinionated developer image: VS Code, GitHub Copilot, Git, Python, and Node.js are pre-installed. Developer Mode is enabled by default. PowerShell 7 is the default shell. Widgets are off, the taskbar is stripped, and Do Not Disturb is on. WSL 2 is pre-configured with GPU passthrough so you do not have to touch a settings panel before running your first local model.
Getting Ollama running on the Dev Box requires no extra configuration. Migrating existing OpenAI SDK code to local inference is a single-line change:
client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
Microsoft also shipped Microsoft Execution Containers (MXC) alongside the Dev Box — a Windows-native sandboxing layer for isolating AI agents. Combined with the hardware, the message is unambiguous: Windows is positioning itself as an OS for running agentic workloads locally, not just a platform that forwards your prompts to a cloud endpoint.
The Mac Studio Question
Anyone shopping for local AI hardware in the past two years has already arrived at the same answer: Mac Studio. Apple Silicon’s unified memory architecture has been the default for developers who want to run large models without a cloud bill. So how does the Dev Box stack up?
| Surface RTX Spark Dev Box | Mac Studio M4 Ultra | |
|---|---|---|
| Memory | 128 GB unified | Up to 192 GB unified |
| CUDA | Yes | No (Metal/MLX only) |
| AI Compute | 1 petaflop (FP4) | ~39 TOPS Neural Engine |
| Available | Fall 2026 | Now |
| Price (est.) | $3,000–$3,500 | $3,999+ (192 GB) |
The honest answer: it depends on your tooling. If your workflow involves vLLM in production, custom CUDA kernels, or NVIDIA-specific optimizations, RTX Spark is the only compact desktop option with CUDA. The Mac Studio M4 Ultra tops out at 192 GB — 50% more than RTX Spark — and benefits from higher memory bandwidth efficiency per gigabyte for pure inference workloads. Mac Studio is also available right now. If you need local AI hardware before fall, that matters.
The Billing Math
The case for owning your inference hardware strengthens the more you use it. Access to 120B-class models via cloud APIs runs $15 to $60 per million tokens at current rates. A developer or small team running agentic pipelines, RAG, code review, or fine-tuning evaluations can easily spend $200 to $400 a month. At that pace, a $3,500 device breaks even within 12 to 18 months — and the privacy and latency benefits start on day one. No network round-trips, no data leaving your machine, no billing surprises.
What to Watch For
Microsoft has not confirmed official pricing — the $3,000 to $3,500 estimate comes from analysts, not a product page. The initial launch is US-only. And if your deployment targets Linux servers, Windows-specific tooling like MXC raises portability questions that Microsoft has not yet answered. The hardware is compelling, but it is still months away from your desk.
The Surface RTX Spark Dev Box ships this fall exclusively via Microsoft.com. No pre-order is open yet — register your interest on the product page.













