大模型安全笔记
Search...
Ctrl + K
LLM-Attack
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
Previous
Evil Geniuses: Delving into the Safety of LLM-based Agents
Next
SHADOW ALIGNMENT: THE EASE OF SUBVERTING SAFELY-ALIGNED LANGUAGE MODELS