BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

PreviousEvil Geniuses: Delving into the Safety of LLM-based Agents NextSHADOW ALIGNMENT: THE EASE OF SUBVERTING SAFELY-ALIGNED LANGUAGE MODELS