Learning to Poison Large Language Models During Instruction Tuning


PreviousMake Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMsNextALIGNMENT IS NOT SUFFICIENT TO PREVENT LARGE LANGUAGE MODELS FROM GENERATING HARMFUL IN FORMATION:
Last updated