Apple Foundation Models: Free On-Device LLM for iOS Developers

iPhone with glowing neural network connections representing Apple Foundation Models on-device AI inference

Apple Foundation Models framework enables free on-device LLM access for Swift developers

Apple’s WWDC26 keynote lands today, and between the Siri redesign and mandatory Liquid Glass deadlines, there is something quieter and more immediately useful: the Foundation Models framework gives every Swift app direct access to a 3-billion-parameter on-device LLM. No API key. No monthly invoice. No data transmitted anywhere. The model runs on-device on Apple silicon — the same chip powering Apple Intelligence — and it costs developers exactly nothing to call.

What the Framework Actually Is

Foundation Models exposes Apple’s on-device language model through a native Swift API. The ~3B parameter model is the same one underlying Apple Intelligence features — summarization, writing assistance, smart replies. Apple opened it to third-party developers alongside iOS 26 last September. With WWDC26, it gets updated capabilities and renewed spotlight as iOS 27 expands Apple Intelligence across the platform.

Device support is tied to Apple Intelligence hardware: iPhone 15 Pro or newer, any M1 iPad, and any M1 Mac. Users also need Apple Intelligence enabled in Settings. That narrows the addressable market — but for apps targeting the premium segment, coverage is already substantial.

The API Is Simpler Than You Expect

The entry point is LanguageModelSession. A basic call looks like this:

import FoundationModels

let session = LanguageModelSession()
let summary = try await session.respond(
    to: "Summarize this user review in two sentences: \(reviewText)"
)

That is it. One async call. No API key in your environment, no rate limit handling, no billing dashboard to watch. For streaming — the token-by-token output — swap in streamResponse, which returns an AsyncSequence:

let stream = session.streamResponse(to: prompt)
for try await partial in stream {
    label.text = partial.content
}

Structured Output Is the Real Differentiator

Most LLM APIs return a string. You then parse it, validate it, and hope the model followed your format instructions. Foundation Models takes a different approach with the @Generable macro:

@Generable
struct FeedbackAnalysis {
    @Guide(description: "Overall sentiment: positive, negative, or neutral")
    var sentiment: String
    @Guide(description: "Top three actionable suggestions extracted from the feedback")
    var suggestions: [String]
}

let result = try await session.respond(
    to: userFeedback,
    generating: FeedbackAnalysis.self
)
print(result.sentiment)    // "positive"
print(result.suggestions)  // ["Add dark mode", "Improve onboarding", ...]

At compile time, @Generable generates a JSON schema from your Swift types. The model is constrained to that schema — no invalid output, no parsing failures, no prompt engineering to coerce a format. The @Guide annotations describe each field in plain English. This is materially better than anything you get from cloud APIs without a separate validation layer on top.

Who This Actually Benefits

The obvious use cases are summarization, classification, and extraction — tasks the 3B model handles well. Think: smart journaling apps that generate personalized insights from daily notes, fitness apps that turn workout logs into human-readable summaries, or customer-facing tools that analyze feedback without sending it anywhere.

The less obvious angle is compliance. Apps in health, finance, and enterprise frequently cannot send user data to third-party servers without substantial legal overhead. Foundation Models eliminates that entirely. Data stays on the device. HIPAA-friendly by architecture, not by contract.

Real apps are already shipping. CellWalk uses Foundation Models to explain scientific terms conversationally. Grammo’s AI grammar tutor gives contextual feedback on exercises. Both ship features that would cost meaningful API spend if built on cloud LLMs. According to Apple’s September 2025 announcement, developers are already using the framework to generate personalized quizzes, deliver workout summaries, and build in-app chat experiences that never touch a server.

What It Cannot Do

The 3B parameter constraint is real. Apple explicitly flags code generation, math calculations, and factual Q&A as outside this model’s design scope. Context windows are limited — long prompts hurt both latency and output quality. This is not a replacement for Claude or GPT-4o when reasoning depth matters.

Think of it as a purpose-built tool for specific tasks, not a general-purpose intelligence layer. When you need world-knowledge retrieval, complex reasoning, or code review, you still need a cloud API. Foundation Models handles the high-frequency, privacy-sensitive, cost-sensitive calls that make up the bulk of what most apps actually do.

Getting Started

The Foundation Models framework documentation covers the full API surface including sessions, guided generation, streaming, and tool calling. Apple’s WWDC25 video “Meet the Foundation Models framework” is the best 30-minute investment for understanding the design decisions behind the API. A practical hands-on exploration guide at createwithswift.com walks through each capability with runnable examples you can drop into a project today.

With WWDC26 expanding Apple Intelligence and adding new model adapters, the capability floor is moving up. If you have been evaluating on-device LLM inference for your app, run the numbers on what you are currently paying cloud providers — and what you could move onto the device for free.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.