Anthropic Reveals 16M Distillation Attacks on Claude
Technology8 min Read

Anthropic Reveals 16M Distillation Attacks on Claude

F

Francesco

Published on Feb 23, 2026

Anthropic Reveals 16M Distillation Attacks on Claude

In a stark and detailed disclosure on February 23, 2026, Anthropic announced it had detected large-scale distillation efforts aimed at its Claude models — campaigns it attributes to three labs named DeepSeek, Moonshot AI, and MiniMax. According to Anthropic, more than 24,000 fraudulent accounts conducted roughly 16 million exchanges with Claude, targeting reasoning, coding, tool use, and agentic behaviors that a malicious actor could absorb into smaller competitor models. citeturn0search0turn0search3

fraudulent accounts API abuse

fraudulent accounts API abuse

The revelation reads like a primer on how modern model theft can scale: many thousands of accounts, proxy networks that mask origin, and prompt patterns designed to coax a powerful model into revealing the internal traces and behaviors that make it valuable. This is not a theoretical vulnerability anymore — Anthropic describes it as industrial-scale distillation. citeturn0search2turn0search4

Anthropic Claude API

Anthropic Claude API

Why this matters

Distillation — training a smaller model on the outputs of a larger one — is a legitimate machine-learning technique. But at scale, and when combined with deceptive access methods, it becomes a vector for intellectual property loss, safety degradation, and geopolitical risk. The stakes are high because a distilled model can inherit capabilities without the same safety guardrails, and those capabilities can be used in contexts ranging from benign consumer products to surveillance or military applications.

What Anthropic says it saw

Anthropic’s report breaks down the campaigns by actor and scale: DeepSeek reportedly ran on the order of 150,000+ targeted interactions aimed at extracting chain-of-thought reasoning and reinforcement-learning reward model signals; Moonshot AI conducted approximately 3.4 million exchanges focusing on agentic reasoning, tool usage, coding, and analysis; and MiniMax carried out the largest campaign at roughly 13 million interactions, aggressively pivoting to new Claude releases to capture newly shipped capabilities. Overall, Anthropic says these operations created the footprint of more than 16 million queries across 24,000 fraudulent accounts. citeturn0search3turn0search0

DeepSeek AI lab

DeepSeek AI lab

Moonshot AI company

Moonshot AI company

MiniMax AI models

MiniMax AI models

The pattern Anthropic describes is familiar: a coordinated "hydra cluster" approach that spreads extraction traffic across many accounts and proxies to avoid throttles and detection, interleaving normal-appearing requests with high-volume, capability-targeted prompts. In some cases, Anthropic observed prompt engineering that explicitly solicited step-by-step reasoning or other internal traces that are highly useful as training signals. citeturn0search0turn0search1

How distillation campaigns operate

The anatomy of a large extraction campaign blends automation, social engineering, and model-specific prompting. Typical elements include:

  • Account farms and proxies: Thousands of accounts, often created and managed through proxy services, obscure origin and distribute traffic to stay below detection thresholds.
  • Directed prompt design: Prompts engineered to elicit the model's reasoning chains, tool invocation formats, or privileged behaviors that a downstream model can mimic.
  • Traffic mixing: Extraction requests mixed with benign user queries to appear legitimate and reduce anomaly scores.
  • Rapid pivoting: When a target model updates, attackers switch traffic to probe the new model and capture fresh capabilities.

Anthropic's technical description matches these tactics, noting patterns of repetition, tight focus on specific capabilities, and swift redirection after Claude updates — behavior consistent with training-oriented data collection rather than normal end-user interaction. citeturn0search0turn0search3

model extraction distillation

model extraction distillation

"Industrial-scale distillation transforms an API into a theft pipeline — and it can move far faster than legal or diplomatic responses."

Who are the accused labs — and what do we know?

Public reporting identifies the three actors by the names DeepSeek, Moonshot AI, and MiniMax. These companies have been in the spotlight in recent months as some newer Chinese-based models surged in capability and market presence. While composition, corporate governance, and precise intent vary by lab, the practical effect described by Anthropic is the same: large volumes of Claude outputs were systematically harvested to train competing systems. Reporting indicates the accused organizations did not immediately respond to requests for comment. citeturn0search2turn0news14

It is important to separate allegation from adjudication. Anthropic's forensic signals — traffic fingerprints, account linkages, and prompt patterns — are powerful indicators, but public transparency, third-party verification, and possible legal processes will determine what ultimately can be proven and what actions follow.

chain of thought reasoning

chain of thought reasoning

agentic AI tools

agentic AI tools

Technical and safety implications

Beyond intellectual property concerns, the technical consequences of distilled models lacking safety infrastructure are significant. A distilled model that inherits advanced reasoning or tool use without corresponding guardrails can be more likely to:

  • Provide unsafe or dual-use guidance (e.g., biological, chemical, cybersecurity exploits).
  • Be more easily manipulated into generating disallowed content because it lacks nuanced content filters.
  • Be deployed in surveillance or autonomous orchestration contexts without oversight.

Anthropic explicitly argued that illicit distillation erodes safety by funneling capabilities into environments where the originating company's safety mitigations do not travel with the capabilities. That argument elevates distillation from a commercial dispute to a public-interest concern. citeturn0search0turn0search1

AI safety guardrails

AI safety guardrails

Caution Distilled models can be easier and cheaper to run at scale, which makes them appealing — and potentially more dangerous if they bypass safety de-risking that responsible labs embed in their deployments.

Anthropic's defensive steps

Anthropic says it has deployed a combination of behavioral classifiers, fingerprinting systems, and account-verification tightening to detect and disrupt large-scale extraction. It also reports sharing technical indicators with other AI labs and cloud providers and developing product and model-level mitigations designed to make outputs less directly useful for illicit training without unduly harming legitimate use.

Key components of the announced defenses include:

  • Behavioral detection: Classifiers trained to spot chain-of-thought elicitation patterns, repeated capability-focused prompts, and abnormal multi-account coordination.
  • Account hygiene: Tighter verification and controls for low-friction account types (education, startups) that attackers historically exploit.
  • Industry collaboration: Sharing indicators of compromise with peers and cloud hosts to reduce the efficacy of proxy networks and resellers used to mask origin.

These are practical first lines of defense, but Anthropic acknowledges that preventing industrial-scale distillation will require continuous technical hardening and broader cooperation from infrastructure providers and policymakers. citeturn0search0turn0search1

Legal, policy, and export-control angles

Distillation campaigns also sit at the intersection of tech policy and geopolitics. If model outputs protected by U.S. firms are used to reproduce capabilities in jurisdictions subject to export controls, the activity could implicate national-security rules and sanctions. Regulators and lawmakers are increasingly attentive to the idea that AI capabilities, not just chips or data, can be subject to export policies — a concept that makes aggressive distillation an issue for policy as much as for security teams.

Anthropic framed its disclosure in those terms, emphasizing risks if distilled capabilities are fed into systems used for surveillance or other sensitive applications. The disclosure has prompted commentary about whether existing legal frameworks are sufficient to deter or punish large-scale, covert model extraction. citeturn0search4turn0search2

export controls AI

export controls AI

What this means for cloud providers and resellers

Cloud providers and API resellers are in the crosshairs. Attackers frequently stitch together proxy chains and third-party resellers to amplify and obscure traffic; that means detection must be a shared responsibility. Cloud companies can harden telemetry, rate-limiting, and attribution; resellers can enforce tighter KYC and usage guarantees. Anthropic's request for industry cooperation underscores that technical defenses at the model level are necessary but not sufficient — adversaries will move horizontally across the stack. citeturn0search0

Practical mitigation techniques for model providers

For labs building and offering large models, practical mitigation options include:

  • Output fingerprinting: Introduce subtle, reversible transformations or diversions that reduce the utility of outputs for training while preserving user-facing quality.
  • Usage attestation: Better KYC and attestation for accounts that request high-volume capability probes.
  • Adaptive rate-limiting: Dynamic throttles keyed to behavioral signals rather than simple token counts.
  • Prompt-sanitization and response redaction: Prevent explicit leaks of chain-of-thought by design or by offering content-sanitized output tiers for high-risk use cases.

These measures are not silver bullets; they raise trade-offs about user experience, privacy, and legitimate research use. That trade-off calculus is now a critical part of product design for commercially exposed models.

Industry and policy next steps

Stopping industrial-scale distillation will require layered action: model-level defenses, infrastructure cooperation, legal clarity, and international diplomatic engagement. Possible next steps include standardized incident sharing among labs, clearer export-control language around capabilities, and industry norms for account verification and reseller conduct.

Without coordinated action, attackers can keep raising the bar for stealth and scale, turning API-accessible frontier models into convenient training farms for actors willing to game verification and attribution systems. Anthropic's disclosure is thus a prompt for a sector-wide upgrade in both technology and governance. citeturn0search1turn0search4

Important Even defensive changes risk unintended consequences: overly aggressive filters or throttles could hamper researchers and small businesses, so policy and product design must balance protection with accessibility.

Conclusion

Anthropic's disclosure of a coordinated, large-scale distillation campaign — involving 24,000 fraudulent accounts and roughly 16 million interactions across three labs — is a watershed moment for commercial AI security. It crystallizes a technical pathway by which capabilities can be transferred at speed and scale, and it shows how access-layer abuses can accelerate capability diffusion beyond the control of original model creators. citeturn0search0turn0search3

The response will have to be multifaceted: improved detection, smarter account and reseller controls, cloud-provider cooperation, and possibly new legal constructs that treat capabilities and model outputs as governed artifacts. For practitioners, investors, and policymakers, the takeaway is clear: the era of easy, anonymous extraction is ending, and how the industry adapts will shape both the commercial landscape and the safety profile of the next generation of AI systems.

Key Takeaways
  • Anthropic reported ~16 million distillation queries from 24,000 fraudulent accounts targeting Claude. citeturn0search0
  • DeepSeek, Moonshot AI, and MiniMax were identified as the primary actors, with MiniMax driving the largest share of requests. citeturn0search3
  • Industrial-scale distillation risks transferring capabilities without safety guardrails, which raises public-safety and export-control concerns. citeturn0search1turn0search2
  • Defenses require coordinated technical, infrastructure, and policy responses to be effective.
#Technology#Anthropic#Claude#distillation#model extraction#DeepSeek#Moonshot#MiniMax#AI security#data theft#fake accounts#API abuse#model theft#China#AI policy#export controls#cybersecurity#model hardening#chain-of-thought#reward models#agentic models#coding models#tool use#behavioral fingerprinting#threat intelligence#cloud providers#account verification#intellectual property#AI governance#model safety#large-scale extraction#ML safety#hydra cluster#proxy networks#LeafDraft
Anthropic Reveals 16M Distillation Attacks on Claude | LeafDraft