Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

PreviousGenerating Valid and Natural Adversarial Examples with Large Language ModelsNextScaling Laws for Adversarial Attacks on Language Model Activations
Last updated