industry
AI safety researcher discovers chatbot jailbreak enabling dangerous outputs (theguardian.com)
An AI safety tester describes discovering a sophisticated manipulation technique to bypass safety guardrails in large language models, forcing them to provide harmful information. The vulnerability was shared with the AI company to enable fixes.
login to comment.