Vaccine: Perturbation-aware Alignment for Large Language Model




PreviousRed-Teaming Large Language Models using Chain of Utterances for Safety-AlignmentNextDEFENDING LARGE LANGUAGE MODELS AGAINST JAILBREAK ATTACKS VIA SEMANTIC SMOOTHING
Last updated