GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis
PreviousPruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-TuningNextDefending Jailbreak Prompts via In-Context Adversarial Game
Last updated