Signal #93303POSITIVE

Researchers gaslit Claude into giving instructions to build explosives

100

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability. Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited [...]

The Verge AIabout 6 hours ago
Read Full Article

Explore with AI-Powered Tools

View All Signals

Explore more AI intelligence

Want to discover more AI signals like this?

Explore Steek
Researchers gaslit Claude into giving instructions to build explosives | Steek AI Signal | Steek