LMDeploy SSRF Exploited in 12 Hours: Patch Now

CVE-2026-33626, a Server-Side Request Forgery (SSRF) vulnerability in LMDeploy’s vision-language model inference engine, was exploited in the wild just 12 hours after public disclosure on April 21, 2026. Attackers used the flaw to steal AWS IAM credentials, scan internal networks, and enumerate administrative endpoints—all within an eight-minute reconnaissance campaign. This represents one of the fastest known exploitation timelines for a major AI infrastructure vulnerability, exposing a critical attack surface that most teams deploying vision-language models have completely overlooked.

The Attack: 12 Hours from Disclosure to Compromise

Within 12 hours and 31 minutes of the GitHub security advisory going live, Sysdig researchers detected the first exploitation attempt hitting their honeypot systems. Moreover, the attacker’s IP address (103.116.72.119) executed a sophisticated reconnaissance campaign that lasted exactly eight minutes and followed three distinct phases.

Phase 1 (03:35-03:37 UTC on April 22): Credential discovery. The attacker probed the AWS Instance Metadata Service at 169.254.169.254/latest/meta-data/iam/security-credentials/ to extract IAM role credentials. They simultaneously tested for Redis access on 127.0.0.1:6379.

Phase 2 (03:41 UTC): Egress testing. Furthermore, using an out-of-band DNS callback to requestrepo.com, the attacker confirmed external connectivity and verified they could exfiltrate data. They also enumerated API endpoints via /openapi.json to discover administrative functions.

Phase 3 (03:42-03:43 UTC): Service enumeration. Additionally, the attacker systematically port-scanned localhost, probing HTTP (8080), MySQL (3306), and standard HTTP (80). They also attempted to invoke /distserve/p2p_drop_connect, an unauthenticated administrative endpoint that can disrupt cluster operations.

Eight minutes. That’s all it took to map the infrastructure, steal cloud credentials, and identify administrative attack vectors. In contrast, typical CVE exploitation timelines range from 30 to 90 days. This 12-hour window demonstrates that attackers are actively monitoring AI infrastructure security advisories with automated exploitation tooling ready to deploy.

How the SSRF Vulnerability Works

The flaw exists in LMDeploy’s load_image() function (lmdeploy/vl/utils.py), which processes image URLs submitted to the OpenAI-compatible /v1/chat/completions API endpoint. When users include an image_url parameter in chat completion requests, the server automatically fetches that URL without validating whether it points to internal IP addresses, cloud metadata services, or localhost.

Here’s how attackers exploit it:

{
  "model": "cogvlm2",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "image_url",
      "image_url": {
        "url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"
      }
    }]
  }]
}

The LMDeploy server makes an HTTP GET request to 169.254.169.254 from its own context—a GPU instance typically running with broad IAM permissions for accessing S3 model artifacts, training datasets, and cross-account assume-role capabilities. Consequently, the AWS Instance Metadata Service returns temporary credentials (access key, secret key, session token) that the attacker can use to compromise the entire cloud account.

As Sysdig’s security research team explains: “What distinguishes CVE-2026-33626 from a textbook SSRF is what the primitive unlocks on an AI-serving node: IAM credentials and cloud metadata. Vision-LLM nodes typically run on GPU instances with broad IAM roles. One successful IMDS fetch can compromise the cloud account.”

The vulnerability affects all LMDeploy versions prior to 0.12.0 that include vision language support, and carries a CVSS severity score of 7.5.

Why Vision-Language Models Are High-Value Targets

Most developers focus AI security efforts on prompt injection attacks and model poisoning. However, they’re missing traditional web vulnerabilities hiding in their inference infrastructure.

Vision-language model (VLM) endpoints create a SSRF attack surface that doesn’t exist in text-only LLMs. VLM endpoints must fetch and process image URLs for legitimate use cases like image captioning, visual question answering, and document understanding. Developers treat these URLs as benign since they’re “just images for AI analysis.” In reality, this creates a powerful HTTP SSRF primitive that can access cloud metadata services, internal databases, administrative APIs, and container runtime endpoints.

Moreover, the deployment context amplifies the damage. Vision-LLM nodes typically run on GPU instances with privileged access: S3 buckets containing model weights and training data, database credentials for inference logs, and cross-account IAM roles for multi-tenant deployments. A traditional SSRF in a web application has limited scope. In contrast, an SSRF in AI infrastructure can compromise production data across multiple AWS accounts.

93% of EC2 instances still don’t enforce IMDSv2 (which requires session tokens, blocking simple GET requests) as of 2022. Therefore, attackers can exploit this vulnerability on the vast majority of cloud deployments without encountering any additional security barriers beyond patching LMDeploy itself.

Companies race to deploy vision-language models for customer support, document analysis, and accessibility features. Meanwhile, attackers use those same image endpoints to steal their AWS keys. The irony would be funny if the security implications weren’t so severe.

What to Do Now

LMDeploy maintainers released a patch (v0.12.3) on April 22, 2026—within 24 hours of the initial disclosure. If you’re running any version prior to 0.12.0 with vision language support, upgrade immediately. The 12-hour exploitation window means attackers are already scanning for vulnerable instances.

Beyond patching, implement these defense layers:

Enforce IMDSv2 on all EC2 instances: This requires session tokens for metadata access, blocking the simple GET requests used in SSRF attacks. AWS CLI: aws ec2 modify-instance-metadata-options --instance-id i-xxx --http-tokens required
Block egress to 169.254.169.254: Use VPC Security Groups or Network ACLs to prevent GPU instances from accessing the metadata service. Note that this doesn’t stop attackers from scanning internal Redis, MySQL, or administrative APIs—you still need to patch.
Review IAM roles on GPU instances: Apply least privilege. Remove S3 full access, database admin permissions, and cross-account assume-role capabilities unless absolutely required. Many teams grant overly broad permissions during development and never tighten them for production.
Deploy runtime detection: Sysdig Falco provides open-source rules that trigger on outbound connections from AI processes to cloud metadata endpoints. Runtime security tools can detect exploitation attempts even before you patch.
Restrict API access: Don’t expose /v1/chat/completions publicly without authentication. Many teams deploy LMDeploy instances for “internal testing” with no authentication, then forget to add it before moving to production.

The standard 90-day responsible disclosure timeline assumes vendors and users have time to test patches before deployment. However, this 12-hour exploitation window suggests AI vulnerabilities need faster response protocols. Attackers aren’t waiting.

The Broader AI Security Gap

CVE-2026-33626 is part of a larger pattern. Industry analysts project 2,800 to 3,600 AI CVEs in 2026—a 31-69% increase from 2,130 in 2025. Furthermore, SSRF attacks overall increased 452% from 2023 to 2024, driven by AI-powered scanning tools that automate reconnaissance. Pwn2Own 2026 added an AI infrastructure category for the first time, targeting tools like Nvidia Triton Inference Server alongside LMDeploy.

AI infrastructure security lags behind AI innovation. Companies prioritize model deployment speed and capabilities over security hardening. Vision-language models get rushed into production for customer-facing applications without adequate security review. Consequently, security teams trained to look for prompt injection and data leakage miss traditional web vulnerabilities like SSRF in inference endpoints.

The result is systemic risk. As more organizations adopt VLMs for production applications—medical image analysis, autonomous vehicle perception, document intelligence—the attack surface expands faster than security practices can adapt. We’re deploying powerful AI capabilities on infrastructure that hasn’t been hardened against decade-old web attack vectors.

Key Takeaways

Patch immediately: Update to LMDeploy v0.12.3 or later if you’re running vision-language models in production.
Assume breach velocity: 12-hour exploitation windows mean automated scanning and immediate attack attempts. Patch AI infrastructure vulnerabilities as fast as you would critical web server bugs.
Audit GPU IAM roles: Vision-LLM instances often have privileged access to S3, databases, and cross-account resources. Apply least privilege before attackers exploit the next SSRF.
Enforce IMDSv2: Block simple GET requests to cloud metadata services. 93% of EC2 instances remain vulnerable to basic SSRF attacks.
Expand security focus beyond AI-specific threats: Prompt injection and model poisoning matter, but traditional web vulnerabilities (SSRF, authentication bypass, insecure deserialization) are hitting AI infrastructure right now.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.