Apple Foundation Models: On-Device LLM in Swift for iOS 26

Apple Foundation Models framework for iOS 26 showing Swift code and on-device AI Neural Engine chip

Apple Foundation Models: Run a 3B on-device LLM in Swift with zero API costs

Apple’s Worldwide Developers Conference kicks off in one week. Before the keynote ends, there will be new sessions to watch, new APIs to absorb, and a fresh wave of “you need to update your app” energy. But one of the most significant iOS 26 developer tools is already available in Xcode 26 beta right now: the Foundation Models framework. It gives Swift developers direct access to the 3-billion-parameter on-device language model powering Apple Intelligence. No API key. No server bill. No data leaving the device. Just code and device. Here is what you need to know to get started before the rest of the ecosystem catches up on June 8.

What Foundation Models Actually Is

The Foundation Models framework is not Apple’s answer to ChatGPT. It does not compete with GPT-4o or Claude 3.5 on reasoning depth or breadth of knowledge. It is a narrowly scoped, privacy-first SDK built for five specific tasks: summarization, extraction, classification, tagging, and content composition or revision. The model is a ~3 billion parameter LLM that runs entirely on Apple silicon — the same Neural Engine, CPU, and GPU stack that powers Apple Intelligence. Nothing is sent to a server. There are no rate limits. There is no cost per request.

The trade-offs are real. Knowledge cutoff is approximately October 2023, which means the model knows nothing about events after that date and will hallucinate confidently if you ask. The context window is 4096 tokens — combined input and output — which is roughly 3,000 words. For a large class of app features, none of that matters. For personal-context tasks — summarizing a user’s own notes, tagging their documents, composing replies based on their data — this model is the right tool, and it ships for free with every eligible device.

Check Availability Before Anything Else

The first thing to understand about shipping Foundation Models in production is that the model is not available on all devices. Apple Intelligence requires specific hardware (iPhone 15 Pro and later, M-series iPads and Macs), must be enabled in Settings, and takes time to download after being enabled. Skip the availability check and you will ship crashes.

let model = SystemLanguageModel.default
switch model.availability {
case .available:
    // Proceed with AI features
case .unavailable(.deviceNotEligible):
    // Fallback to non-AI experience
case .unavailable(.appleIntelligenceNotEnabled):
    // Guide user: Settings > Apple Intelligence
case .unavailable(.modelNotReady):
    // Model still downloading, show fallback
}

Always code the non-AI path. A meaningful percentage of your users will hit one of those unavailable cases at launch.

Sessions: The Core API

LanguageModelSession is the primary interface. Create one, call respond(to:), get back text. For any UI that should feel responsive, use streamResponse(to:) instead — it delivers partial output as the model generates, exactly like the streaming pattern you know from cloud APIs. Sessions automatically maintain conversation history, so you do not need to replay the transcript with each call. For role-specific behavior, pass instructions on initialization.

// Role-specific session
let session = LanguageModelSession(
    instructions: "You summarize engineering notes. Be concise and use bullet points."
)

// Basic response
let response = try await session.respond(to: userNote)
print(response.content)

// Streaming (recommended for UI)
let stream = session.streamResponse(to: userNote)
for try await chunk in stream {
    displayedSummary = chunk.asPartiallyGenerated()
}

Check session.isResponding to manage UI state — disable the submit button while the model is generating to avoid stacked requests.

Guided Generation: The Feature Worth Knowing Now

The @Generable macro is where Foundation Models pulls away from generic LLM wrappers. It turns a Swift struct into a structured output schema. The model generates responses that map directly to your types — no JSON parsing, no Codable boilerplate, no fragile prompt engineering telling the model to “respond only in JSON.” The framework constrains generation to your schema at inference time.

@Generable
struct TripSuggestion {
    let destination: String
    let topActivity: String
    let estimatedDays: Int
    let budgetLevel: String
}

let suggestion = try await session.respond(
    generating: TripSuggestion.self,
    prompt: "Suggest a weekend trip for an outdoor enthusiast near Seattle"
)

print(suggestion.destination)   // Typed String. No parsing needed.
print(suggestion.estimatedDays) // Typed Int.

This eliminates an entire class of production bugs. Malformed LLM output crashing JSON parsers is a real problem in applications using cloud APIs — the model drifts from the expected format, the parser throws, the app crashes. @Generable makes this impossible. The output is either valid and typed, or the generation fails cleanly. For extraction and classification tasks, this is the correct pattern.

Know These Limits Before You Ship

There are five hard limits that will cause production issues if you do not account for them upfront:

4096 token context window. Input plus output combined. For document summarization, design for chunking from the start.
Knowledge cutoff of October 2023. Never use the model for current events, version numbers, or recent world knowledge. Treat it as a text processor, not a knowledge base.
Nine languages only. English, French, German, Italian, Spanish, Portuguese, Japanese, Korean, and Simplified Chinese. Plan accordingly for broader audiences.
Aggressive safety filters. These can block legitimate requests in medical, legal, and security contexts. Test your exact prompts against the actual model early, not at QA time.
Apple Intelligence must be enabled by the user. This is user-controlled, with a hardware eligibility gate. Always implement the fallback path.

Foundation Models vs. a Cloud API

Use Foundation Models when your feature involves personal user data — notes, documents, calendar entries, health data — needs to work offline, or processes short inputs within the 4096 token window. Privacy-sensitive apps in health, finance, and legal categories are the strongest fit. The on-device guarantee is a genuine differentiator that privacy-aware users will value and that your privacy policy will appreciate.

Use a cloud API when the task requires complex multi-step reasoning, up-to-date world knowledge, inputs longer than roughly 3,000 words, multimodal inputs (images, audio), or language support beyond the nine available. These are real constraints, not hypothetical edge cases. The practical rule: if you are summarizing, extracting, classifying, or revising content that already lives on the user’s device, Foundation Models is likely the right tool. If you are answering open-ended questions that require reasoning across general knowledge, it is not.

Start Now, Not After WWDC

Most iOS developers will pick this up in the week after WWDC when the session videos drop and every tech newsletter runs a summary. That is fine. But the developers who will ship Foundation Models features first are the ones experimenting with Xcode 26 beta this week. The Foundation Models documentation on developer.apple.com is comprehensive and already reflects the current beta. The WWDC25 “Meet Foundation Models” session is worth watching before June 8 — most of what shipped then is what you are building against now. For deeper topics — adapter training for domain-specific behavior, tool calling for calendar and reminders integration — the Foundation Models adapter training documentation is the right next step. And if you want a practical getting-started walkthrough, AppCoda’s Foundation Models guide covers the full setup flow with working examples.

The window where you can be ahead of the ecosystem on this is one week. Use it.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.