大模型安全笔记

VLM-Attack

Circumventing Concept Erasure Methods For Text-to-Image Generative Models Efficient LLM-Jailbreaking by Introducing Visual Modality From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking Adversarial Attacks on Multimodal Agents Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Ima Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Lar White-box Multimodal Jailbreaks Against Large Vision-Language Models Red Teaming Visual Language Models Private Attribute Inference from Images with Vision-Language Models Assessment of Multimodal Large Language Models in Alignment with Human Values Privacy-Aware Visual Language Models Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbre Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks Red Teaming Visual Language Models Adversarial Illusions in Multi-Modal Embeddings Universal Prompt Optimizer for Safe Text-to-Image Generation On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts Adversarial Illusions in Multi-Modal Embeddings Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models On the Robustness of Large Multimodal Models Against Image Adversarial Attacks Hijacking Context in Large Multi-modal Models Transferable Multimodal Attack on Vision-Language Pre-training Models Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimoda AN IMAGE IS WORTH 1000 LIES: ADVERSARIAL TRANSFERABILITY ACROSS PROMPTS ON VISIONLANGUAGE MODELS Test-Time Backdoor Attacks on Multimodal Large Language Models JAILBREAK IN PIECES: COMPOSITIONAL ADVERSARIAL ATTACKS ON MULTI-MODAL LANGUAGE MODELS Jailbreaking Attack against Multimodal Large Language Model Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts IMAGE HIJACKS: ADVERSARIAL IMAGES CAN CONTROL GENERATIVE MODELS AT RUNTIME VISUAL ADVERSARIAL EXAMPLES JAILBREAK ALIGNED LARGE LANGUAGE MODELS Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks Query-Relevant Images Jailbreak Large Multi-Modal Models Towards Adversarial Attack on Vision-Language Pre-training Models HowMany Are Unicorns in This Image? ASafety Evaluation Benchmark for Vision LLMs SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Au MISUSING TOOLS IN LARGE LANGUAGE MODELS WITH VISUAL ADVERSARIAL EXAMPLES VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Mod Shadowcast: STEALTHY DATA POISONING ATTACKS AGAINST VISION-LANGUAGE MODELS FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts THE WOLF WITHIN: COVERT INJECTION OF MALICE INTO MLLM SOCIETIES VIA AN MLLM OPERATIVE Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast How Robust is Google’s Bard to Adversarial Image Attacks?OnEvaluating Adversarial Robustness of Large Vision-Language Models Onthe Adversarial Robustness of Multi-Modal Foundation Models Are aligned neural networks adversarially aligned?READING ISN’T BELIEVING: ADVERSARIAL ATTACKS ON MULTI-MODAL NEURONS Black Box Adversarial Prompting for Foundation Models Evaluation and Analysis of Hallucination in Large Vision-Language Models FOOL YOUR (VISION AND) LANGUAGE MODEL WITH EMBARRASSINGLY SIMPLE PERMUTATIONS VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models Transferable Multimodal Attack on Vision-Language Pre-training Models BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

PreviousScalable Performance Analysis for Vision-Language Models NextCircumventing Concept Erasure Methods For Text-to-Image Generative Models

Last updated 1 year ago