Others

Others

INFERRING OFFENSIVENESS IN IMAGES FROM NATURAL LANGUAGE SUPERVISIONchevron-rightAn LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnechevron-rightMore RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthinesschevron-rightAI SAFETY: A CLIMB TO ARMAGEDDON?chevron-rightAI RISK MANAGEMENT SHOULD INCORPORATE BOTH SAFETY AND SECURITYchevron-rightDefending Against Social Engineering Attacks in the Age of LLMschevron-rightAdversarial Perturbations Cannot Reliably Protect Artists From Generative AIchevron-rightTraining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedbackchevron-rightDeduplicating Training Data Makes Language Models Betterchevron-rightMITIGATING TEXT TOXICITY WITH COUNTERFACTUAL GENERATIONchevron-rightThe First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?chevron-rightMitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrievalchevron-rightDoes Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?chevron-rightMitigating LLM Hallucinations via Conformal Abstentionchevron-rightDetecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedbackchevron-rightCan ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensicschevron-rightAn Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscapechevron-rightMitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decodingchevron-rightLARGE LANGUAGE MODELS AS AUTOMATED ALIGNERS FOR BENCHMARKING VISION-LANGUAGE MODELSchevron-rightPoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamicschevron-rightReducing hallucination in structured outputs via Retrieval-Augmented Generationchevron-rightModerating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Visionchevron-rightAttacking LLM Watermarks by Exploiting Their Strengthschevron-rightThe Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Modchevron-rightTOFU: A Task of Fictitious Unlearning for LLMschevron-rightLearning and Forgetting Unsafe Examples in Large Language Modelschevron-rightExploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial Trachevron-rightTruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Spacechevron-rightIn Search of Truth: An Interrogation Approach to Hallucination Detectionchevron-rightFact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantificationchevron-rightUnsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Modelschevron-rightLocating and Mitigating Gender Bias in Large Language Modelschevron-rightLearning to Edit: Aligning LLMs with Knowledge Editingchevron-rightMitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decodingchevron-rightQuantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Suchevron-rightDoes DETECTGPT Fully Utilize Perturbation? Bridge Selective Perturbation to Fine-tuned Contrastive Lchevron-rightTELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detectionchevron-rightSPOTTING LLMS WITH BINOCULARS: ZERO-SHOT DETECTION OF MACHINE-GENERATED TEXTchevron-rightLLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcasechevron-rightWHAT’S IN MY BIG DATA?chevron-rightUNDERSTANDING CATASTROPHIC FORGETTING IN LANGUAGE MODELS VIA IMPLICIT INFERENCEchevron-rightUnsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Modelschevron-rightToxicity in CHATGPT: Analyzing Persona-assigned Language Modelschevron-rightMemeCraft: Contextual and Stance-Driven Multimodal Meme Generationchevron-rightModerating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Visionchevron-rightModerating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Modelschevron-rightPoisoned ChatGPT Finds Work for Idle Hands: Exploring Developers’ Coding Practices with Insecure Sugchevron-rightZero shot VLMs for hate meme detection: Are we there yet?chevron-rightANALYZING AND MITIGATING OBJECT HALLUCINATION IN LARGE VISION-LANGUAGE MODELSchevron-rightMITIGATING HALLUCINATION IN LARGE MULTIMODAL MODELS VIA ROBUST INSTRUCTION TUNINGchevron-rightDENEVIL: TOWARDS DECIPHERING AND NAVIGATING THE ETHICAL VALUES OF LARGE LANGUAGE MODELS VIA INSTRUCTchevron-rightDisentangling Perceptions of Offensiveness: Cultural and Moral Correlateschevron-rightRed teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicitychevron-rightLARGE LANGUAGE MODELS AS AUTOMATED ALIGNERS FOR BENCHMARKING VISION-LANGUAGE MODELSchevron-rightNot what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prochevron-rightInferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidancechevron-rightCAN LANGUAGE MODELS BE INSTRUCTED TO PROTECT PERSONAL INFORMATION?chevron-rightAART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applicationschevron-rightPrompt Injection Attacks and Defenses in LLM-Integrated Applicationschevron-rightRemoving RLHF Protections in GPT-4 via Fine-Tuningchevron-rightSPML: A DSL for Defending Language Models Against Prompt Attackschevron-rightStealthy Attack on Large Language Model based Recommendationchevron-rightLarge Language Models Sometimes Generate Purely Negatively-Reinforced Textchevron-rightOn the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspectivechevron-rightLearning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring tchevron-rightlonghorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systemchevron-rightA Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learningchevron-rightAdversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Modelschevron-rightDiscovering the Hidden Vocabulary of DALLE-2chevron-rightRaising the Cost of Malicious AI-Powered Image Editingchevron-rightNegating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimichevron-rightALIGNERS: DECOUPLING LLMS AND ALIGNMENTchevron-rightCAN LLM-GENERATED MISINFORMATION BE DETECTED?chevron-rightOn the Risk of Misinformation Pollution with Large Language Modelschevron-rightEvading Watermark based Detection of AI-Generated Contentchevron-rightMitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World’s Uglinchevron-rightPrivacy-Preserving Instructions for Aligning Large Language Modelschevron-rightTOWARDS UNDERSTANDING THE INTERPLAY OF GENERATIVE ARTIFICIAL INTELLIGENCE AND THE INTERNETchevron-rightEvaluating the Social Impact of Generative AI Systems in Systems and Societychevron-rightTransformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanitieschevron-rightAttacking LLM Watermarks by Exploiting Their Strengthschevron-rightTOWARDS RESPONSIBLE AI IN THE ERA OF GENERATIVE AI: A REFERENCE ARCHITECTURE FOR DESIGNING FOUNDATIOchevron-rightRAFT: Reward rAnked FineTuning for Generative Foundation Model Alignmentchevron-rightIntent-aligned AI systems deplete human agency: the need for agency foundations research in AI safetchevron-rightRisk Assessment and Statistical Significance in the Age of Foundation Modelschevron-rightThe Foundation Model Transparency Indexchevron-rightThe Privacy Pillar - A Conceptual Framework for Foundation Model-based Systemschevron-rightA Baseline Analysis of Reward Models’ Ability To Accurately Analyze Foundation Models Under Distribuchevron-rightFoundational Moral Values for AI Alignmentchevron-rightHazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Modelschevron-rightON CATASTROPHIC INHERITANCE OF LARGE FOUNDATION MODELSchevron-rightFoundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoningchevron-rightRewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustmenchevron-rightFoundation Model Transparency Reportschevron-rightSECURING RELIABILITY: A BRIEF OVERVIEW ON ENHANCING IN-CONTEXT LEARNING FOR FOUNDATION MODELSchevron-rightEXPLORING THE ADVERSARIAL CAPABILITIES OF LARGE LANGUAGE MODELSchevron-rightTRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identificationchevron-rightLLM-Resistant Math Word Problem Generation via Adversarial Attackschevron-rightEfficient Black-Box Adversarial Attacks on Neural Text Detectorschevron-rightAdversarial Preference Optimizationchevron-rightCombating Adversarial Attacks with Multi-Agent Debatechevron-rightHow the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Qchevron-rightL-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attackschevron-rightHidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detectionchevron-rightWhat Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detectiochevron-rightPrompted Contextual Vectors for Spear-Phishing Detectionchevron-rightToken-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detectionchevron-rightRecursive Chain-of-Feedback Prevents Performance Degradation from Redundant Promptingchevron-rightWatch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agentschevron-rightRADAR: Robust AI-Text Detection via Adversarial Learningchevron-rightOUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Exampchevron-rightWhy do universal adversarial attacks work on large language models?: Geometry might be the answerchevron-rightJ-Guard: Journalism Guided Adversarially Robust Detection of AI-generated Newschevron-rightDistilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challengechevron-rightDetoxifying Large Language Models via Knowledge Editingchevron-rightHealing Unsafe Dialogue Responses with Weak Supervision Signalschevron-right

Last updated