Others
Others
INFERRING OFFENSIVENESS IN IMAGES FROM NATURAL LANGUAGE SUPERVISIONAn LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised VulneMore RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model TrustworthinessAI SAFETY: A CLIMB TO ARMAGEDDON?AI RISK MANAGEMENT SHOULD INCORPORATE BOTH SAFETY AND SECURITYDefending Against Social Engineering Attacks in the Age of LLMsAdversarial Perturbations Cannot Reliably Protect Artists From Generative AITraining a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackDeduplicating Training Data Makes Language Models BetterMITIGATING TEXT TOXICITY WITH COUNTERFACTUAL GENERATIONThe First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge RetrievalDoes Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?Mitigating LLM Hallucinations via Conformal AbstentionDetecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI FeedbackCan ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media ForensicsAn Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat LandscapeMitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingLARGE LANGUAGE MODELS AS AUTOMATED ALIGNERS FOR BENCHMARKING VISION-LANGUAGE MODELSPoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsReducing hallucination in structured outputs via Retrieval-Augmented GenerationModerating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large VisionAttacking LLM Watermarks by Exploiting Their StrengthsThe Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language ModTOFU: A Task of Fictitious Unlearning for LLMsLearning and Forgetting Unsafe Examples in Large Language ModelsExploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial TraTruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceIn Search of Truth: An Interrogation Approach to Hallucination DetectionFact-Checking the Output of Large Language Models via Token-Level Uncertainty QuantificationUnsupervised Real-Time Hallucination Detection based on the Internal States of Large Language ModelsLocating and Mitigating Gender Bias in Large Language ModelsLearning to Edit: Aligning LLMs with Knowledge EditingMitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingQuantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv SuDoes DETECTGPT Fully Utilize Perturbation? Bridge Selective Perturbation to Fine-tuned Contrastive LTELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News DetectionSPOTTING LLMS WITH BINOCULARS: ZERO-SHOT DETECTION OF MACHINE-GENERATED TEXTLLM-as-a-Coauthor: The Challenges of Detecting LLM-Human MixcaseWHAT’S IN MY BIG DATA?UNDERSTANDING CATASTROPHIC FORGETTING IN LANGUAGE MODELS VIA IMPLICIT INFERENCEUnsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image ModelsToxicity in CHATGPT: Analyzing Persona-assigned Language ModelsMemeCraft: Contextual and Stance-Driven Multimodal Meme GenerationModerating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large VisionModerating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language ModelsPoisoned ChatGPT Finds Work for Idle Hands: Exploring Developers’ Coding Practices with Insecure SugZero shot VLMs for hate meme detection: Are we there yet?ANALYZING AND MITIGATING OBJECT HALLUCINATION IN LARGE VISION-LANGUAGE MODELSMITIGATING HALLUCINATION IN LARGE MULTIMODAL MODELS VIA ROBUST INSTRUCTION TUNINGDENEVIL: TOWARDS DECIPHERING AND NAVIGATING THE ETHICAL VALUES OF LARGE LANGUAGE MODELS VIA INSTRUCTDisentangling Perceptions of Offensiveness: Cultural and Moral CorrelatesRed teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and ToxicityLARGE LANGUAGE MODELS AS AUTOMATED ALIGNERS FOR BENCHMARKING VISION-LANGUAGE MODELSNot what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect ProInferAligner: Inference-Time Alignment for Harmlessness through Cross-Model GuidanceCAN LANGUAGE MODELS BE INSTRUCTED TO PROTECT PERSONAL INFORMATION?AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered ApplicationsPrompt Injection Attacks and Defenses in LLM-Integrated ApplicationsRemoving RLHF Protections in GPT-4 via Fine-TuningSPML: A DSL for Defending Language Models Against Prompt AttacksStealthy Attack on Large Language Model based RecommendationLarge Language Models Sometimes Generate Purely Negatively-Reinforced TextOn the Robustness of ChatGPT: An Adversarial and Out-of-distribution PerspectiveLearning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring tlonghorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systemA Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial LearningAdversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained ModelsDiscovering the Hidden Vocabulary of DALLE-2Raising the Cost of Malicious AI-Powered Image EditingNegating Negatives: Alignment without Human Positive Samples via Distributional Dispreference OptimiALIGNERS: DECOUPLING LLMS AND ALIGNMENTCAN LLM-GENERATED MISINFORMATION BE DETECTED?On the Risk of Misinformation Pollution with Large Language ModelsEvading Watermark based Detection of AI-Generated ContentMitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World’s UglinPrivacy-Preserving Instructions for Aligning Large Language ModelsTOWARDS UNDERSTANDING THE INTERPLAY OF GENERATIVE ARTIFICIAL INTELLIGENCE AND THE INTERNETEvaluating the Social Impact of Generative AI Systems in Systems and SocietyTransformation vs Tradition: Artificial General Intelligence (AGI) for Arts and HumanitiesAttacking LLM Watermarks by Exploiting Their StrengthsTOWARDS RESPONSIBLE AI IN THE ERA OF GENERATIVE AI: A REFERENCE ARCHITECTURE FOR DESIGNING FOUNDATIORAFT: Reward rAnked FineTuning for Generative Foundation Model AlignmentIntent-aligned AI systems deplete human agency: the need for agency foundations research in AI safetRisk Assessment and Statistical Significance in the Age of Foundation ModelsThe Foundation Model Transparency IndexThe Privacy Pillar - A Conceptual Framework for Foundation Model-based SystemsA Baseline Analysis of Reward Models’ Ability To Accurately Analyze Foundation Models Under DistribuFoundational Moral Values for AI AlignmentHazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation ModelsON CATASTROPHIC INHERITANCE OF LARGE FOUNDATION MODELSFoundation Model Sherpas: Guiding Foundation Models through Knowledge and ReasoningRewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference AdjustmenFoundation Model Transparency ReportsSECURING RELIABILITY: A BRIEF OVERVIEW ON ENHANCING IN-CONTEXT LEARNING FOR FOUNDATION MODELSEXPLORING THE ADVERSARIAL CAPABILITIES OF LARGE LANGUAGE MODELSTRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box IdentificationLLM-Resistant Math Word Problem Generation via Adversarial AttacksEfficient Black-Box Adversarial Attacks on Neural Text DetectorsAdversarial Preference OptimizationCombating Adversarial Attacks with Multi-Agent DebateHow the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial QL-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial AttacksHidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay DetectionWhat Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot DetectioPrompted Contextual Vectors for Spear-Phishing DetectionToken-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text DetectionRecursive Chain-of-Feedback Prevents Performance Degradation from Redundant PromptingWatch Out for Your Agents! Investigating Backdoor Threats to LLM-Based AgentsRADAR: Robust AI-Text Detection via Adversarial LearningOUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated ExampWhy do universal adversarial attacks work on large language models?: Geometry might be the answerJ-Guard: Journalism Guided Adversarially Robust Detection of AI-generated NewsDistilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler ChallengeDetoxifying Large Language Models via Knowledge EditingHealing Unsafe Dialogue Responses with Weak Supervision Signals
PreviousScalable Extraction of Training Data from (Production) Language ModelsNextINFERRING OFFENSIVENESS IN IMAGES FROM NATURAL LANGUAGE SUPERVISION
Last updated