Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
PreviousBASELINE DEFENSES FOR ADVERSARIAL ATTACKS AGAINST ALIGNED LANGUAGE MODELSNextLLMsCanDefend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper
Last updated