Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
PreviousTOKEN-LEVEL ADVERSARIAL PROMPT DETECTION BASED ON PERPLEXITY MEASURES AND CONTEXTUAL INFORMATIONNextStudious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Last updated

