Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language
PreviousLMSanitator: Defending Prompt-Tuning Against Task-Agnostic BackdoorsNextAnalyzing And Editing Inner Mechanisms of Backdoored Language Models
Last updated