175K Ollama Servers Exposed: $46K/Day Attack Costs

Security researchers discovered 175,000 publicly-exposed Ollama AI server instances across 130 countries this week, with 48% advertising dangerous tool-calling capabilities and zero authentication. Between October 2025 and January 2026, attackers launched 91,403 documented sessions targeting these exposed endpoints. The culmination: Operation Bizarre Bazaar, the first documented criminal campaign commercializing stolen AI access through underground marketplaces.

This isn’t a vulnerability disclosure. This is systemic negligence across the entire open-source AI deployment ecosystem, and it’s costing victims $46,000 to $100,000 per day in unauthorized inference charges.

175,000 Exposed Instances Across 130 Countries

SentinelOne and Censys published findings on January 29, 2026 revealing the staggering scale: 175,000 unique Ollama hosts publicly accessible on the internet. Nearly half (48%) advertise tool-calling capabilities that enable LLM interaction with external systems and APIs. Another 201 hosts run completely uncensored prompt templates with safety guardrails removed.

The geographic spread compounds the problem. China hosts roughly 30% of exposed instances, followed by the United States, Germany, France, and South Korea. Cisco Talos identified 1,139 vulnerable Ollama instances in September 2025, finding that 88.89% use OpenAI-compatible API schemas. Moreover, this compatibility isn’t convenient – it’s catastrophic. Exploits developed for one instance work against thousands of others.

Furthermore, of the exposed servers Cisco found, 18.8% actively host responsive models (primarily Mistral and LLaMA). The remaining 80% may appear dormant, but they’re exploitable through unauthorized model uploads or configuration manipulation. Every single one represents a potential entry point.

91,000+ Attack Sessions and Criminal Marketplaces

GreyNoise’s honeypot infrastructure captured 91,403 attack sessions between October 2025 and January 2026. The pace accelerated dramatically: 80,469 of those sessions occurred in just 11 days between December 28, 2025 and January 8, 2026. Christmas Day 2025 saw a targeted spike – 1,688 attack sessions in 48 hours exploiting Ollama’s model pull functionality for server-side request forgery.

The attackers aren’t amateurs. The IPs GreyNoise identified have exploitation histories spanning 200+ CVEs and generated over 4 million sensor hits across their detection network. Additionally, two IP addresses systematically probed 73+ model endpoints using both OpenAI-compatible and Google Gemini API formats. GreyNoise researchers were blunt: “Eighty thousand enumeration requests represent investment. Threat actors don’t map infrastructure at this scale without plans to use that map.”

Those plans materialized in Operation Bizarre Bazaar. Pillar Security documented this campaign between December 2025 and January 2026, capturing 35,000 attack sessions. The threat actor, operating under aliases “Hecker,” “Sakuya,” and “LiveGamer101,” built a commercial marketplace called silver.inc – marketed as “The Unified LLM API Gateway.” Consequently, this service resells discounted access to 30+ compromised LLM providers without legitimate authorization.

The campaign remains active as of this writing. Criminals are commercializing stolen AI access while victims pay the cloud bills.

The $46,000-Per-Day Cost of Negligence

LLMjacking attacks drain budgets fast. Sysdig’s research calculated the worst-case scenario for Claude 2.x abuse: (500K tokens/1000 × $0.016) × 60 minutes × 24 hours × 4 regions = $46,080 per day. By late 2025, costs escalated further as attackers targeted Claude 3 Opus and similar advanced models, with some victims losing over $100,000 daily.

The attack economics are straightforward. Attackers discover exposed instances, validate access, then resell that access at discounted rates through underground marketplaces like silver.inc. In other words, buyers get cheap inference, attackers profit from arbitrage, and the legitimate account owner gets a catastrophic cloud bill.

These aren’t theoretical projections. Real organizations have sustained these losses for days or weeks before discovering the breach. For startups or research teams running exposed Ollama instances, a single successful attack could be financially devastating.

Root Cause: No Authentication + Developer Negligence

Ollama ships without authentication or access control mechanisms by default. On Linux installations, it binds to localhost (127.0.0.1:11434), which is secure. However, developers routinely reconfigure it to 0.0.0.0 for remote access, then expose port 11434 to the public internet without firewalls, reverse proxies, or any security layer whatsoever.

Cisco Talos didn’t mince words: “The findings highlight a widespread neglect of fundamental security practices such as access control, authentication and network isolation in the deployment of AI systems, often stemming from organizations rushing to adopt emerging technologies without informing IT or security teams.”

Critical vulnerabilities compound the baseline negligence. CVE-2024-37032, nicknamed “Probllama,” enables remote code execution via path traversal in Ollama’s /api/pull endpoint. Wiz Research discovered this in May 2024, and Ollama patched it within four hours – but any instance running versions prior to 0.1.34 remains vulnerable. Similarly, CVE-2024-28224 allows DNS rebinding attacks enabling unauthenticated API calls in versions below 0.1.29. UpGuard’s research identified six critical flaws total, enabling denial-of-service attacks, model theft, and model poisoning.

This is a dual failure. Tool creators prioritized ease-of-use over security. Deployers treated AI infrastructure like spinning up a development server – fast, casual, without security considerations. The result is a massive attack surface that professional cybercriminals are actively industrializing.

What Developers Must Do NOW

Check whether your Ollama instance is publicly accessible. Use Shodan or Censys to search for your IP address on port 11434. If you find exposure, implement a multi-layered defense immediately.

Essential security measures: Close port 11434 via firewall configuration and whitelist trusted IP addresses only. Deploy a reverse proxy like nginx or Caddy with OAuth2.0 authentication in front of Ollama. Implement network segmentation using VPCs or Docker network isolation. For remote access, use VPN solutions like Tailscale or WireGuard instead of public exposure. Additionally, upgrade to Ollama version 0.1.34 or newer to patch CVE-2024-37032 and related vulnerabilities.

Monitor access logs for unauthorized inference requests. Watch for unusual patterns: high token consumption, unfamiliar source IPs, or requests to models you haven’t loaded. The threat actors have already mapped the landscape – they’re building target lists for monetization right now.

If you’re running Ollama in production without these protections, you’re not experimenting with bleeding-edge AI. You’re volunteering to pay someone else’s cloud bill.

The Broader Problem Won’t Stop Here

This exposure pattern will repeat with every new open-source AI tool that prioritizes ease-of-use over security defaults. LocalAI, vLLM servers, and other self-hosted LLM solutions face similar challenges. Operation Bizarre Bazaar targeted multiple platforms, not just Ollama.

SentinelOne researchers noted the governance complexity: “The residential nature of much of the infrastructure complicates traditional governance and requires new approaches that distinguish between managed cloud deployments and distributed edge infrastructure.”

The AI boom is built on fundamentally insecure foundations. Developers are deploying powerful infrastructure with shocking casualness, and professional threat actors are industrializing exploitation through commercial marketplaces. The 91,403 attack sessions captured are reconnaissance – the real exploitation is just beginning.

Secure your deployments now. The next $46,000 cloud bill could be yours.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

DX Core 4: Unifying DORA, SPACE, and DevEx Frameworks

GitHub Copilot Memory: 28-Day Context Goes Live

Leave a reply Cancel reply

Categories