Backdoor Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignm

Last updated