Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation
PreviousGPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-ExplanationNextCompetition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Last updated
Last updated