IMMUNIZATION AGAINST HARMFUL FINE-TUNING AT TACKS


PreviousStealthy and Persistent Unalignment on Large Language Models via Backdoor InjectionsNextEMULATED DISALIGNMENT: SAFETY ALIGNMENT FOR LARGE LANGUAGE MODELS MAY BACKFIRE!
Last updated


Last updated