Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts
PreviousLLMsCanDefend Themselves Against Jailbreaking in a Practical Manner: A Vision PaperNextSelf-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
Last updated
