AMD just made a credible case for running a 300-billion-parameter language model on a single machine — no GPU cluster required, no cloud subscription, no waiting for inference capacity to free up. On May 21, the company announced the Ryzen AI Max PRO 400 series: three new x86 chips supporting up to 192GB of unified memory and 160GB of VRAM. The companion developer platform, the Ryzen AI Halo mini PC, opens for pre-orders in June at $3,999. The higher-memory PRO 400-based systems arrive from ASUS, HP, and Lenovo in Q3 2026.
Before you put in a pre-order, there is one thing AMD is not advertising prominently: fitting a 300B model in 160GB requires aggressive quantization. At INT4, a 300B model takes roughly 150GB — just under the wire. At FP16, the same model would need around 600GB. AMD’s claim is technically accurate, but the realistic sweet spot for most developers is 128B-parameter models in full precision, which is still a significant milestone for x86 hardware.
What AMD Actually Announced
The PRO 400 series comes in three SKUs, all built on the same Zen 5 CPU + RDNA 3.5 GPU + XDNA 2 NPU architecture as the previous generation, with incremental clock speed improvements and the memory ceiling pushed up:
- Ryzen AI Max+ PRO 495: 16 cores, 40 GPU compute units, 55 TOPS NPU, up to 192GB unified memory
- Ryzen AI Max PRO 490: 12 cores, 32 GPU compute units, 50 TOPS NPU, up to 192GB
- Ryzen AI Max PRO 485: 8 cores, 32 GPU compute units, 50 TOPS, up to 192GB
All three can allocate up to 160GB of that memory pool as VRAM. The TDP is configurable from 45W to 120W, making these viable for thin-and-light laptops and mobile workstations alike. The Ryzen AI Halo mini PC, shipping in June, uses the previous-generation Ryzen AI Max+ 395 with 128GB maximum memory, available exclusively through Micro Center at launch. The PRO 400-based Halo refresh follows in Q3 2026.
The Comparison Developers Actually Need
The market for personal AI workstations has gotten crowded fast. Here is where the Ryzen AI Max PRO 400 actually sits:
| Platform | Max Memory | Bandwidth | Largest Model | Price | OS |
|---|---|---|---|---|---|
| AMD Ryzen AI Max PRO 495 | 192GB | ~270 GB/s | 300B+ (INT4) | TBD (Q3 2026) | Win / Linux |
| AMD Ryzen AI Halo (current) | 128GB | 273 GB/s | ~200B (INT4) | $3,999 | Win / Linux |
| Nvidia DGX Spark | 128GB | 273 GB/s | ~130B full | ~$4,999 | Linux only |
| Apple Mac Studio M4 Max | 128GB | 546 GB/s | ~130B full | $1,999 | macOS |
| Apple Mac Studio M4 Ultra | Up to 512GB | 819 GB/s | 400B+ | $3,999+ | macOS |
The number AMD is not leading with is memory bandwidth: the Ryzen AI Max platform runs at approximately 273 GB/s, compared to Apple’s M4 Max at 546 GB/s. LLM inference is memory-bandwidth-bound, not compute-bound. That means an Apple M4 Max will generate tokens roughly 1.5 to 2 times faster on the same 70B or 128B model. AMD’s advantage is not speed — it is the memory ceiling. The 192GB configuration fits models that do not fit on Apple’s current M4 lineup, and it does so on a platform that runs Windows and Linux natively.
If you need maximum inference speed and macOS works for your workflow, the Mac Studio M4 Max at $1,999 remains the more sensible option. If you need to run 128B+ parameter models on an x86 machine with full Windows or Linux access, AMD is currently the only option in this form factor.
Software Stack: ROCm Is the Key Variable
The hardware specs are compelling, but for most developers the real question is whether the software stack cooperates. AMD ships the Ryzen AI Halo with ROCm pre-installed and validated against Ollama, llama.cpp, vLLM, PyTorch, LM Studio, and ComfyUI. The current generation (Max+ 395) already has community-tested ROCm builds for gfx1151 architecture.
ROCm has a reputation for lagging behind CUDA in third-party compatibility, and that reputation was earned. The situation in 2026 is substantially better — but developers with specialized inference tooling should verify compatibility before assuming everything just works. You can start with AMD’s official Ollama + ROCm setup guide as the baseline, then test your specific stack.
The Cloud Cost Math
AMD’s ROI claim deserves scrutiny, not dismissal. A developer running 6 million daily AI tokens through a cloud API spends approximately $773 per month. The Ryzen AI Halo at $3,999 costs roughly $16 per month in electricity under sustained 150W load. At that usage level, the hardware pays for itself in about six months and saves roughly $23,000 over three years.
That math holds if you are already a heavy API user. It does not hold for occasional experiments or shared team workloads with existing cloud credits. The ROI case is strongest for solo developers and small teams with high-volume, predictable inference workloads they are currently paying for per token.
What to Do Right Now
If you want to get your hands on AMD’s developer platform in the near term, the Ryzen AI Halo (128GB, Ryzen AI Max+ 395) opens for pre-orders at Micro Center in June at $3,999. It handles models up to 200B parameters and ships with the full ROCm software stack pre-configured.
If the 192GB ceiling is what you need — running a quantized 300B model or fitting a 128B model with a large context window — wait for the PRO 400-based systems from ASUS, HP, and Lenovo in Q3 2026. Pricing has not been announced, but expect systems to land above $4,500 given the memory upgrade.
AMD has made local inference on Windows and Linux genuinely viable for model sizes that required a data center rack two years ago. The bandwidth gap with Apple is real. The memory ceiling advantage is also real. Which one matters more depends entirely on what you are building.













