Open the Pandora’s Box of LLMs: Jailbreaking LLMs through Representation Engineering
PreviousThe Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language ModNextHow Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Huma
Last updated

