Anthropic Withholds Mythos AI Model Over Cybersecurity Risk

Citing the model’s ability to generate working exploits without human intervention, Anthropic is routing Claude Mythos Preview exclusively to 12 partner organizations through a new initiative called Project Glasswing.

Anthropic has declined to release its latest frontier model, Claude Mythos Preview, to the public, citing the model’s advanced ability to find and exploit software vulnerabilities, the company said Tuesday.

The San Francisco-based AI lab announced that Mythos will instead be deployed through a restricted cybersecurity program called Project Glasswing, which includes 12 partner organizations. Apple, Google, Microsoft, Amazon Web Services, Nvidia, Broadcom, Cisco, CrowdStrike, Palo Alto Networks, and the Linux Foundation are among the confirmed participants. Anthropic said it has committed up to $100 million in usage credits for the initiative.

“Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. Instead, we are using it as part of a defensive cybersecurity program with a limited set of partners.” Anthropic wrote in the model’s system card.

The model is not purpose-built for security work. Mythos is a general-purpose model whose cybersecurity performance derives from strong coding and reasoning skills. Over several weeks of internal testing, the model identified thousands of zero-day vulnerabilities, many of them one to two decades old. Among them was a 27-year-old flaw in OpenBSD, an operating system widely regarded for its security hardening.

ALSO READ: Google Releases Offline AI Dictation App for iOS With No Subscription Fee

Internal red team testing produced findings that prompted Anthropic to forgo a public release entirely. Engineers with no formal security training directed Mythos to search for remote code execution vulnerabilities overnight and found complete, working exploits by morning, it’s Frontier Red Team wrote in a blog post. In separate tests, automated scaffolds allowed Mythos to convert discovered vulnerabilities into functional exploits without any human input.

The model also demonstrated behavior that it described as unsolicited. During one controlled test, a researcher asked Mythos to send a message if it could escape a virtual sandbox. The model succeeded and then, without further instruction, posted details of its exploit to publicly accessible websites. Anthropic disclosed the incident in its system card.

There was a lot of internal deliberation. We really do view this as a first step for giving a lot of cyber defenders a head start on a topic that will be increasingly important.
Dianne Penn, Anthropic’s Head of Research Product Management.

The announcement follows an earlier security incident. Fortune reported in late March that a draft document describing the model, then internally named Capybara, had been left in an unsecured publicly accessible data cache. Anthropic attributed the exposure to human error. A separate incident involving the Claude Code software package inadvertently exposed nearly 2,000 source code files and caused thousands of GitHub repositories to be taken down.

Anthropic said it has held discussions with the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation regarding Mythos’s capabilities. Newton Cheng, it’s Frontier Red Team cyber lead, said the company wants partner firms to gain experience with the model before these capabilities become more broadly available.

Cybersecurity is just going to be an area where this broad increase in capabilities has potential for risk, and thus we have to keep a really close eye on what’s going on there.
Newton Cheng, Anthropic’s Frontier Red Team Cyber Lead.

Beyond the 12 formal partners, the company said approximately 40 additional organizations will receive access to the Mythos preview. The company said its long-term objective is to develop safeguards sufficient for a wider deployment of Mythos-class models, though no timeline was provided.

Disclaimer: This report is based on publicly available reports, including official press releases from Anthropic. Nervnow has not independently verified the claims.

MORE FROM ANTHROPIC
Anthropic’s Free Courses With Certificates
Pentagon Questions Anthropic on Claude’s Military Use
Broadcom and Google Expand AI Chip Deal to Supercharge Anthropic’s Growth
Anthropic Refuses Pentagon Pressure to Remove AI Safeguards, Vows to Stay Until the Door Closes