大模型安全笔记
Search...
Ctrl
K
VLM-Attack
Previous
Scalable Performance Analysis for Vision-Language Models
Next
Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Last updated
1 year ago
Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Efficient LLM-Jailbreaking by Introducing Visual Modality
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Adversarial Attacks on Multimodal Agents
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Ima
Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Lar
White-box Multimodal Jailbreaks Against Large Vision-Language Models
Red Teaming Visual Language Models
Private Attribute Inference from Images with Vision-Language Models
Assessment of Multimodal Large Language Models in Alignment with Human Values
Privacy-Aware Visual Language Models
Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbre
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
Red Teaming Visual Language Models
Adversarial Illusions in Multi-Modal Embeddings
Universal Prompt Optimizer for Safe Text-to-Image Generation
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Adversarial Illusions in Multi-Modal Embeddings
Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images
INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Hijacking Context in Large Multi-modal Models
Transferable Multimodal Attack on Vision-Language Pre-training Models
Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimoda
AN IMAGE IS WORTH 1000 LIES: ADVERSARIAL TRANSFERABILITY ACROSS PROMPTS ON VISIONLANGUAGE MODELS
Test-Time Backdoor Attacks on Multimodal Large Language Models
JAILBREAK IN PIECES: COMPOSITIONAL ADVERSARIAL ATTACKS ON MULTI-MODAL LANGUAGE MODELS
Jailbreaking Attack against Multimodal Large Language Model
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts
IMAGE HIJACKS: ADVERSARIAL IMAGES CAN CONTROL GENERATIVE MODELS AT RUNTIME
VISUAL ADVERSARIAL EXAMPLES JAILBREAK ALIGNED LARGE LANGUAGE MODELS
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
Query-Relevant Images Jailbreak Large Multi-Modal Models
Towards Adversarial Attack on Vision-Language Pre-training Models
HowMany Are Unicorns in This Image? ASafety Evaluation Benchmark for Vision LLMs
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Au
MISUSING TOOLS IN LARGE LANGUAGE MODELS WITH VISUAL ADVERSARIAL EXAMPLES
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Mod
Shadowcast: STEALTHY DATA POISONING ATTACKS AGAINST VISION-LANGUAGE MODELS
FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts
THE WOLF WITHIN: COVERT INJECTION OF MALICE INTO MLLM SOCIETIES VIA AN MLLM OPERATIVE
Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
How Robust is Google’s Bard to Adversarial Image Attacks?
OnEvaluating Adversarial Robustness of Large Vision-Language Models
Onthe Adversarial Robustness of Multi-Modal Foundation Models
Are aligned neural networks adversarially aligned?
READING ISN’T BELIEVING: ADVERSARIAL ATTACKS ON MULTI-MODAL NEURONS
Black Box Adversarial Prompting for Foundation Models
Evaluation and Analysis of Hallucination in Large Vision-Language Models
FOOL YOUR (VISION AND) LANGUAGE MODEL WITH EMBARRASSINGLY SIMPLE PERMUTATIONS
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Transferable Multimodal Attack on Vision-Language Pre-training Models
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning