Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework
PreviousDefending Jailbreak Prompts via In-Context Adversarial GameNextJailbreaker in Jail: Moving Target Defense for Large Language Models
Last updated
Last updated