LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
PreviousA Cross-Language Investigation into Jailbreak Attacks in Large Language ModelsNextBest-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Last updated
Last updated