Others

Others

INFERRING OFFENSIVENESS IN IMAGES FROM NATURAL LANGUAGE SUPERVISIONAn LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised VulneMore RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model TrustworthinessAI SAFETY: A CLIMB TO ARMAGEDDON?AI RISK MANAGEMENT SHOULD INCORPORATE BOTH SAFETY AND SECURITYDefending Against Social Engineering Attacks in the Age of LLMsAdversarial Perturbations Cannot Reliably Protect Artists From Generative AITraining a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackDeduplicating Training Data Makes Language Models BetterMITIGATING TEXT TOXICITY WITH COUNTERFACTUAL GENERATIONThe First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge RetrievalDoes Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?Mitigating LLM Hallucinations via Conformal AbstentionDetecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI FeedbackCan ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media ForensicsAn Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat LandscapeMitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingLARGE LANGUAGE MODELS AS AUTOMATED ALIGNERS FOR BENCHMARKING VISION-LANGUAGE MODELSPoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsReducing hallucination in structured outputs via Retrieval-Augmented GenerationModerating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large VisionAttacking LLM Watermarks by Exploiting Their StrengthsThe Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language ModTOFU: A Task of Fictitious Unlearning for LLMsLearning and Forgetting Unsafe Examples in Large Language ModelsExploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial TraTruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceIn Search of Truth: An Interrogation Approach to Hallucination DetectionFact-Checking the Output of Large Language Models via Token-Level Uncertainty QuantificationUnsupervised Real-Time Hallucination Detection based on the Internal States of Large Language ModelsLocating and Mitigating Gender Bias in Large Language ModelsLearning to Edit: Aligning LLMs with Knowledge EditingMitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingQuantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv SuDoes DETECTGPT Fully Utilize Perturbation? Bridge Selective Perturbation to Fine-tuned Contrastive LTELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News DetectionSPOTTING LLMS WITH BINOCULARS: ZERO-SHOT DETECTION OF MACHINE-GENERATED TEXTLLM-as-a-Coauthor: The Challenges of Detecting LLM-Human MixcaseWHAT’S IN MY BIG DATA?UNDERSTANDING CATASTROPHIC FORGETTING IN LANGUAGE MODELS VIA IMPLICIT INFERENCEUnsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image ModelsToxicity in CHATGPT: Analyzing Persona-assigned Language ModelsMemeCraft: Contextual and Stance-Driven Multimodal Meme GenerationModerating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large VisionModerating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language ModelsPoisoned ChatGPT Finds Work for Idle Hands: Exploring Developers’ Coding Practices with Insecure SugZero shot VLMs for hate meme detection: Are we there yet?ANALYZING AND MITIGATING OBJECT HALLUCINATION IN LARGE VISION-LANGUAGE MODELSMITIGATING HALLUCINATION IN LARGE MULTIMODAL MODELS VIA ROBUST INSTRUCTION TUNINGDENEVIL: TOWARDS DECIPHERING AND NAVIGATING THE ETHICAL VALUES OF LARGE LANGUAGE MODELS VIA INSTRUCTDisentangling Perceptions of Offensiveness: Cultural and Moral CorrelatesRed teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and ToxicityLARGE LANGUAGE MODELS AS AUTOMATED ALIGNERS FOR BENCHMARKING VISION-LANGUAGE MODELSNot what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect ProInferAligner: Inference-Time Alignment for Harmlessness through Cross-Model GuidanceCAN LANGUAGE MODELS BE INSTRUCTED TO PROTECT PERSONAL INFORMATION?AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered ApplicationsPrompt Injection Attacks and Defenses in LLM-Integrated ApplicationsRemoving RLHF Protections in GPT-4 via Fine-TuningSPML: A DSL for Defending Language Models Against Prompt AttacksStealthy Attack on Large Language Model based RecommendationLarge Language Models Sometimes Generate Purely Negatively-Reinforced TextOn the Robustness of ChatGPT: An Adversarial and Out-of-distribution PerspectiveLearning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring tlonghorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systemA Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial LearningAdversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained ModelsDiscovering the Hidden Vocabulary of DALLE-2Raising the Cost of Malicious AI-Powered Image EditingNegating Negatives: Alignment without Human Positive Samples via Distributional Dispreference OptimiALIGNERS: DECOUPLING LLMS AND ALIGNMENTCAN LLM-GENERATED MISINFORMATION BE DETECTED?On the Risk of Misinformation Pollution with Large Language ModelsEvading Watermark based Detection of AI-Generated ContentMitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World’s UglinPrivacy-Preserving Instructions for Aligning Large Language ModelsTOWARDS UNDERSTANDING THE INTERPLAY OF GENERATIVE ARTIFICIAL INTELLIGENCE AND THE INTERNETEvaluating the Social Impact of Generative AI Systems in Systems and SocietyTransformation vs Tradition: Artificial General Intelligence (AGI) for Arts and HumanitiesAttacking LLM Watermarks by Exploiting Their StrengthsTOWARDS RESPONSIBLE AI IN THE ERA OF GENERATIVE AI: A REFERENCE ARCHITECTURE FOR DESIGNING FOUNDATIORAFT: Reward rAnked FineTuning for Generative Foundation Model AlignmentIntent-aligned AI systems deplete human agency: the need for agency foundations research in AI safetRisk Assessment and Statistical Significance in the Age of Foundation ModelsThe Foundation Model Transparency IndexThe Privacy Pillar - A Conceptual Framework for Foundation Model-based SystemsA Baseline Analysis of Reward Models’ Ability To Accurately Analyze Foundation Models Under DistribuFoundational Moral Values for AI AlignmentHazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation ModelsON CATASTROPHIC INHERITANCE OF LARGE FOUNDATION MODELSFoundation Model Sherpas: Guiding Foundation Models through Knowledge and ReasoningRewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference AdjustmenFoundation Model Transparency ReportsSECURING RELIABILITY: A BRIEF OVERVIEW ON ENHANCING IN-CONTEXT LEARNING FOR FOUNDATION MODELSEXPLORING THE ADVERSARIAL CAPABILITIES OF LARGE LANGUAGE MODELSTRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box IdentificationLLM-Resistant Math Word Problem Generation via Adversarial AttacksEfficient Black-Box Adversarial Attacks on Neural Text DetectorsAdversarial Preference OptimizationCombating Adversarial Attacks with Multi-Agent DebateHow the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial QL-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial AttacksHidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay DetectionWhat Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot DetectioPrompted Contextual Vectors for Spear-Phishing DetectionToken-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text DetectionRecursive Chain-of-Feedback Prevents Performance Degradation from Redundant PromptingWatch Out for Your Agents! Investigating Backdoor Threats to LLM-Based AgentsRADAR: Robust AI-Text Detection via Adversarial LearningOUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated ExampWhy do universal adversarial attacks work on large language models?: Geometry might be the answerJ-Guard: Journalism Guided Adversarially Robust Detection of AI-generated NewsDistilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler ChallengeDetoxifying Large Language Models via Knowledge EditingHealing Unsafe Dialogue Responses with Weak Supervision Signals

Last updated