Composite Backdoor Attacks Against Large Language Models

PreviousEMULATED DISALIGNMENT: SAFETY ALIGNMENT FOR LARGE LANGUAGE MODELS MAY BACKFIRE!NextAWolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easi
Last updated