TOKEN-LEVEL ADVERSARIAL PROMPT DETECTION BASED ON PERPLEXITY MEASURES AND CONTEXTUAL INFORMATION
PreviousJAB: Joint Adversarial Prompting and Belief AugmentationNextRobust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Last updated

