DEFENDING LARGE LANGUAGE MODELS AGAINST JAILBREAK ATTACKS VIA SEMANTIC SMOOTHING


PreviousVaccine: Perturbation-aware Alignment for Large Language ModelNextBreak the Breakout: Reinventing LM Defense Against Jailbreak Attacks with Self-Refinement
Last updated