
For two months, a model called “Owl Alpha” ranked in the top three on OpenRouter by call volume. Developers tried it, rated it, plugged it into agentic coding workflows, and moved on with their day — with no idea they were running a 1.6-trillion-parameter model trained entirely on Chinese chips by a food delivery company. On June 30, Meituan pulled back the curtain. The model is called LongCat-2.0, it is now open source under the MIT License, and the benchmarks are harder to ignore than the backstory.
What LongCat-2.0 Is
LongCat-2.0 is a Mixture-of-Experts model with 1.6 trillion total parameters — but only about 48 billion activate per token (the range is 33B to 56B, dynamic). That MoE sparsity is what makes the economics work at this scale. The model carries a native 1-million-token context window, was trained on 30 trillion tokens spanning code, English, Chinese, and multilingual data, and is built for agentic coding workflows rather than general-purpose chat.
Three architectural features stand out. LongCat Sparse Attention (LSA) handles the million-token context by using Streaming-aware, Cross-Layer, and Hierarchical Indexing to keep memory tractable. A 135-billion-parameter N-gram Embedding layer sits orthogonal to the MoE experts — an unusual design choice not seen in most comparable models. And Multi-Token Prediction (MTP) drives speculative decoding via a single dense layer head (not MoE), delivering an acceptance rate above 90% and inference speeds above 100 tokens per second at the API.
The Chinese Chips Claim Worth Paying Attention To
LongCat-2.0 was trained end-to-end on 50,000 Huawei Atlas-950 accelerators using Huawei’s HCCL coordination library — the domestic equivalent of NVIDIA’s NCCL. Meituan’s claim is that this makes LongCat-2.0 the first frontier-scale model to complete both pre-training and inference on domestic Chinese ASICs. That distinction matters: DeepSeek V4-Pro used domestic chips for inference but not for the full training run.
The straightforward read is that US export controls failed. The more precise read is that China reached frontier-scale domestic training roughly 12 to 18 months earlier than the export control playbook anticipated. LongCat-2.0 does not prove the controls are useless — they still raise costs, slow timelines, and impose harder engineering trade-offs. What it shows is that domestic hardware adaptation moved faster than most analysts projected. That is a meaningful signal regardless of your position on the underlying policy.
Benchmarks: Where It Actually Stands
On SWE-bench Pro — the standard measure for real-world software engineering tasks — LongCat-2.0 scores 59.5. That puts it ahead of GPT-5.5 (58.6) and Gemini 3.1 Pro (54.2). It trails Claude Opus 4.7, which leads the field at 64.3. It also scores 70.8 on Terminal-Bench 2.1 and 77.3 on SWE-Bench Multilingual.
One caveat worth stating clearly: as of June 30, independent third-party verification from evaluators like ArtificialAnalysis had not yet been completed. The numbers above come from Meituan’s release materials and early community testing on OpenRouter. For production decisions, wait for independent validation before treating these as settled.
How to Use It Now
LongCat-2.0 is available on OpenRouter today as meituan/longcat-2.0. For direct API access, Meituan’s platform at longcat.chat supports both OpenAI-compatible and Anthropic-compatible endpoints:
from openai import OpenAI
client = OpenAI(
base_url="https://api.longcat.chat/openai/v1",
api_key="YOUR_LONGCAT_API_KEY"
)
response = client.chat.completions.create(
model="longcat-2.0",
messages=[{"role": "user", "content": "Review this PR and suggest fixes"}]
)
For self-hosted deployment, weights are being published to HuggingFace under MIT License — commercial use, modification, and proprietary embedding are all permitted with no open-source obligation. Weights were listed as “coming soon” at launch; availability is expected within days of this writing.
Launch pricing runs $0.30 per million input tokens and $1.20 per million output tokens, with context-cache hits billed free. Standard pricing after the promotion is $0.75 input and $2.95 output. By comparison, Claude Opus 4.7 costs roughly $15 input and $75 output per million tokens. LongCat-2.0 sits within five points of GPT-5.5 on SWE-bench Pro at 20 to 40 times lower cost — that delta matters for teams running high-volume agentic coding pipelines.
The Bigger Picture
The “Owl Alpha” strategy was calculated. Running anonymously on OpenRouter let developers evaluate the model on its own merits before the announcement triggered reflexive skepticism about Chinese AI. By June 30, LongCat-2.0 had earned top-three placement by actual call volume — competing directly against Claude, GPT-5.5, and Gemini 3.1 in the same catalog, without brand recognition working in its favor.
Whether LongCat-2.0 belongs in your stack depends on your use case. For high-frequency agentic coding tasks where cost per token matters and you can tolerate some benchmark uncertainty while independent evaluation catches up, it belongs on your shortlist. If you need the best verified SWE-bench performance today and cost is secondary, Claude Opus 4.7 still leads. But “near-frontier performance, MIT licensed, open source, 20x cheaper” tends to drive adoption quickly — and LongCat-2.0 already has two months of real-world OpenRouter usage behind it before most developers even knew it existed. That head start is not nothing. Follow the VentureBeat coverage or the official LongCat-2.0 model page for updates as third-party benchmarks land.













