Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment


PreviousBASELINE DEFENSES FOR ADVERSARIAL ATTACKS AGAINST ALIGNED LANGUAGE MODELSNextLLMsCanDefend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper
Last updated