Subscribe to our monthly AI Safety and Cybersecurity R&D Tracker updates!
Threats using AI models
Prompt Injection and Input Manipulation (Direct and Indirect)
Covers:
- OWASP LLM 01: Prompt Injection
- OWASP ML 01: Input Manipulation Attack
- MITRE ATLAS Initial Access, Privilege Escalation, and Defense Evasion
Research:
- "Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks" (Pasquini et al., Nov 2024)
- "Optimization-based Prompt Injection Attack to LLM-as-a-Judge" (Shi et al., Nov 2024)
- "Goal-guided Generative Prompt Injection Attack on Large Language Models" (Zhang et al., Nov 2024)
- "Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures" (Benjamin et al., Oct 2024)
- "InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models" (Li, Liu, and Xiao, Oct 2024)
- "FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks" (Wang et al., Oct 2024)
- "Embedding-based classifiers can detect prompt injection attacks
- "Imprompter: Tricking LLM Agents into Improper Tool Use" (Fu et al., Oct 2024)
- "System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective" (Wu, Cecchetti, and Xiao, Oct 2024)
- "EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage" (Liao et al., Sep 2024)
- "Efficient Detection of Toxic Prompts in Large Language Models" (Liu et al., Sep 2024)
- "Goal-guided Generative Prompt Injection Attack on Large Language Models" (Zhang et al., Sep 2024)
- "Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions" (Zhang et al., Sep 2024)
- "Optimization-based Prompt Injection Attack to LLM-as-a-Judge" (Shi et al., Aug 2024)
- "Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks" (De Stefano, Schönherr, Pellegrino, Aug 2024)
- "InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents" (Zhan et al., Aug 2024)
- "LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks" (Happe, Kaplan, and Cito, Aug 2024)
- "Securing the Diagnosis of Medical Imaging: An In-depth Analysis of AI-Resistant Attacks" (Biswas et al., Aug 2024)
- "On Feasibility of Intent Obfuscating Attacks" (Li and Shafto, Jul 2024)
System and Meta Prompt Extraction
Covers:
- MITRE ATLAS Discovery and Exfiltration
Research:
Obtain and Develop (Software) Capabilities, Acquire Infrastructure, or Establish Accounts
Covers:
- MITRE ATLAS Resource Development
Research:
- "Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search" (Moss, Aug 2024)
Jailbreak, Cost Harvesting, or Erode ML Model Integrity
Covers:
- MITRE ATLAS Privilege Escalation, Defense Evasion, and Impact
Research:
- "SQL Injection Jailbreak: a structural disaster of large language models" (Zhao et al., Nov 2024)
- "LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs" (Jha, Arora, and Ganesh, Nov 2024)
- "AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks" (Zeng et al., Nov 2024)
- "DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers" (Li et al., Nov 2024)
- "SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains" (Saiem et al., Nov 2024)
- "Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models" (Yang and Fu, Oct 2024)
- "Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning" (Hasan, Rugina, and Wang, Oct 2024)
- "Tree of Attacks: Jailbreaking Black-Box LLMs Automatically" (Mehrotra et al., Oct 2024)
- "Fight Back Against Jailbreaking via Prompt Adversarial Tuning" (Mo et al., Oct 2024)
- "HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models" (Zhang et al., Oct 2024)
- "Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector" (Huang et al., Oct 2024)
- "Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses" (Zheng et al., Oct 2024)
- "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" (Yang et al., Oct 2024)
- "Transferable Adversarial Attacks on SAM and Its Downstream Models" (Xia et al., Oct 2024)
- "Remote Timing Attacks on Efficient Language Model Inference" (Carlini and Nasr, Oct 2024)
- "MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models" (Weng et al., Oct 2024)
- "RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process" (Wang, Liu, and Xiao, Oct 2024)
- "Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents" (Kumar et al., Oct 2024)
- "Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning" (Xu, Gumaste, and Singh, Oct 2024)
- "Adversarial Attacks on Large Language Models Using Regularized Relaxation" (Chacko et al., Oct 2024)
- "Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models" (Li et al., Oct 2024)
- "Compiled Models, Built-In Exploits: Uncovering Pervasive Bit-Flip Attack Surfaces in DNN Executables" (Chen et al., Oct 2024)
- "Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks" (Peng et al., Oct 2024)
- "Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs" (Gong et al., Oct 2024)
- "Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks" (Wang et al., Oct 2024)
- "Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models"(Shen et al., Oct 2024)
- "PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach" (Lin et al., Oct 2024)
- "Rethinking and Defending Protective Perturbation in Personalized Diffusion Models" (Liu et al., Oct 2024)
- "Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models" (Yu et al., Sep 2024)
- "Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity" (Berezin, Farahbakhsh, Crespi, Oct 2024)
- "Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI" (Rawat et al., Sep 2024)
- "Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction" (Zhang et al., 2024)
- "Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation" (Shahariar et al., Sep 2024)
- "Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack" (Russinovich, Salem, and Eldan, Sep 2024)
- "VulZoo: A Comprehensive Vulnerability Intelligence Dataset" (Ruan et al., Sep 2024)
- "Adversarial Attacks on Machine Learning-Aided Visualizations" (Fujiwara et al., Sep 2024)
- "Jailbreaking Large Language Models with Symbolic Mathematics" (Bethany et al., Sep 2024)
- "Image Hijacks: Adversarial Images can Control Generative Models at Runtime" (Bailey et al., Sep 2024)
- "LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses" (Lin et al., Sep 2024)
- "Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents" (Shafran, Schuster, and Shmatikov, Sep 2024)
- "Security Attacks on LLM-based Code Completion Tools" (Cheng et al., Sep 2024)
- "Adversarial Attacks to Multi-Modal Models" (Dou et al., Sep 2024)
- "HSF: Defending against Jailbreak Attacks with Hidden State Filtering" (Qian et al., Sep 2024)
- "Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models" (Kalavasis et al., Sep 2024)
- "Recent Advances in Attack and Defense Approaches of Large Language Models" (Cui et al., Sep 2024)
- "SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner" (Wang et al., Sep 2024)
- "LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet" (Li et al., Sep 2024)
- "Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models" (Ma et al., Sep 2024)
- "Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models" (An et al., Sep 2024)
- "The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models" (Wu et al., Aug 2024)
- "Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models" (Du et al., Aug 2024)
- "LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet" (Li et al., Aug 2024)
- "Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything" (Zou et al., Aug 2024)
- "A StrongREJECT for Empty Jailbreaks" (Souly et al., Aug 2024)
- "RT-Attack: Jailbreaking Text-to-Image Models via Random Token" (Gao et al., Aug 2024)
- "CAMH: Advancing Model Hijacking Attack in Machine Learning" (He et al., Aug 2024)
- "RT-Attack: Jailbreaking Text-to-Image Models via Random Token" (Gao et al., Aug 2024)
- "BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger" (Chen et al., Aug 2024)
- "Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks" (Zhao et al., Aug 2024)
- "A Survey of Trojan Attacks and Defenses to Deep Neural Networks" (Jin et al., Aug 2024)
- "MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defensesfor Vision Language Models" (Weng et al., Aug 2024)
- "Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment" (Wang and Shu, Aug 2023)
- "Resilience in Online Federated Learning: Mitigating Model-Poisoning Attacks via Partial Sharing" (Lari et al., Aug 2024)
- "EnJa: Ensemble Jailbreak on Large Language Models" (Zhang et al., Aug 2024)
- "Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?" (Bahrami, Vishwamitra, and Najafirad, Aug 2024)
- "Jailbreaking Text-to-Image Models with LLM-Based Agents" (Dong et al., Aug 2024)
- "Can LLMs be Fooled? Investigating Vulnerabilities in LLMs" (Abdali et al., Jul 2024)
- "Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models" (Lu et al., Jul 2024)
- "Vera Verto: Multimodal Hijacking Attack" (Zhang et al., Jul 2024)
Proxy AI ML Model (Simulations)
Covers:
- MITRE ATLAS ML Attack Staging
Verify Attack (Efficacy)
Covers:
- MITRE ATLAS ML Attack Staging
Insecure Output Handling
Covers:
- OWASP LLM 02: Insecure Output Handling
Research:
- "Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling" (Athanasiou, Jung, and Palmidessi, Nov 2024)
- "OSLO: One-Shot Label-Only Membership Inference Attacks" (Peng et al., Oct 2024)
- "Detecting Training Data of Large Language Models via Expectation Maximization" (Kim et al., Oct 2024)
- "Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models" (Suriyakumar et al., Oct 2024)
- "Black-box Membership Inference Attacks against Fine-tuned Diffusion Models" (Pang and Wang, Sep 2024)
- "Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks" (He et al., Sep 2024)
- "Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies" (Wang et al., Aug 2024)
- "Membership Inference Attack Against Masked Image Modeling" (Li et al., Aug 2024)
- "Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities" (Khowaja, Aug 2024)
- "Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks" (Lomurno and Matteucci, Jul 2024)
- "Thermometer: Towards Universal Calibration for Large Language Models" (Shen et al., Jun 2024)
Sensitive Information Disclosure
Covers:
- OWASP LLM 06: Sensitive Information Disclosure
Research:
- "Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective" (Luo et al., Sep 2024)
- "LLM-PBE: Assessing Data Privacy in Large Language Models" (Li et al., Sep 2024)
- "PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action" (Shao et al., Sep 2024)
- "Privacy-preserving Universal Adversarial Defense for Black-box Models" (Li et al., Aug 2024)
- "DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts" (Sun et al., Aug 2024)
- "Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models" (Chong et al., Aug 2024)
Insecure Plugin Design and Plugin Compromise
Covers:
- OWASP LLM 07: Insecure Plugin Design
- MITRE ATLAS Execution & Privilege Escalation
Research:
Hallucination Squatting and Phishing
Covers:
- MITRE ATLAS Initial Access
Research:
- "DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection" (Chiba et al., Oct 2024)
- "We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs" (Spracklen et al., Sep 2024)
Persistence
Covers:
Backdoor ML Model and Craft Adversarial Data
Covers:
- MITRE ATLAS ML Attack Staging
Research:
- "When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations" (Ge et al., Nov 2024)
- "Backdoor defense, learnability and obfuscation" (Christiano et al., Nov 2024)
- "Combinational Backdoor Attack against Customized Text-to-Image Models" (Jiang et al., Nov 2024)
- "Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion" (Guo et al., Nov 2024)
- "Backdoor Attacks against Image-to-Image Networks" (Jiang et al., Nov 2024)
- "Infighting in the Dark: Multi-Labels Backdoor Attack in Federated Learning" (Li et al., Nov 2024)
- "Planting Undetectable Backdoors in Machine Learning Models" (Goldwasser et al., Nov 2024)
- "On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World" (Gia Doan et al., Oct 2024)
- "Model X-ray:Detecting Backdoored Models via Decision Boundary" (Su et al., Oct 2024)
- "Dullahan: Stealthy Backdoor Attack against Without-Label-Sharing Split Learning" (Pu et al., Oct 2024)
- "Mind Your Questions Towards Backdoor Attacks on Text-to-Visualization Models" (Li et al., Oct 2024)
- "BadCM: Invisible Backdoor Attack Against Cross-Modal Learning" (Zheng et al., Oct 2024)
- "BACKTIME: Backdoor Attacks on Multivariate Time Series Forecasting" (Lin et al., Oct 2024)
- "Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning" (Zhao et al., Oct 2024)
- "Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations" (Bartolini, Stoyanov, and Giaretta, Sep 2024)
- "EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody" (Schoof et al., Sep 2024)
- "Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers" (Abad et al. Sep 2024)
- "Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning" (Zhang, Gong, and Reiter)
- "TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors" (Mo et al., Sep 2024)
- "INK: Inheritable Natural Backdoor Attack Against Model Distillation" (Liu et al., Sep 2024)
- "Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor" (Miah and Bi, Sep 2024)
- "Rethinking Backdoor Detection Evaluation for Language Models" (Yan et al., Sep 2024)
- "Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks" (Li et al., Aug 2024)
- "Transferring Backdoors between Large Language Models by Knowledge Distillation" (Cheng et al., Aug 2024)
- "Revocable Backdoor for Deep Model Trading" (Xu et al., Aug 2024)
- "BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning" (Wu et al., Jul 2024)
- "Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models" (Hao et al., Jul 2024)
Supply Chain Vulnerabilities and Compromise
Covers:
- OWASP LLM 05/OWASP ML 06/MITRE ATLAS Initial Access
Excessive Agency/Agentic Manipulation/Agentic Systems
We added 'agentic' manipulation to this subcategory.
Covers:
- OWASP LLM 08: Excessive Agency
Research:
- "AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents" (Xu et al., Oct 2024)
- "Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation" (Kwartler et al., Oct 2024)
- "Bayes-Nash Generative Privacy Protection Against Membership Inference Attacks" (Zhang et al., Oct 2024)
- "Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems" (Lee and Tiwari, Oct 2024)
- "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents" (Zhang et al., Oct 2024)
- "BreachSeek: A Multi-Agent Automated Penetration Tester" (Alshehri et al., Sep 2024)
- "Safeguarding AI Agents: Developing and Analyzing Safety Architectures" (Domkunwar and N S, Sep 2024)
- "Secret Collusion among Generative AI Agents" (Motwani et al., Aug 2024)
- "Large Language Model Sentinel: LLM Agent for Adversarial Purification" (Lin and Zhao, Aug 2024)
- "Compromising Embodied Agents with Contextual Backdoor Attacks" (Liu et al., Aug 2024)
- "Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification" (Zhang et al., Jul 2024)
Copyright Infringement
Covers:
Research:
Threat to AI Models
General Approaches
We added this subsection to cover research that broadly looks at AI security.
Research:
- "Blockchain for Large Language Model Security and Safety: A Holistic Survey" (Geren et al., Nov 2024)
- "One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks" (Guo et al., Nov 2024)
- "How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries" (Banerjee et al., Nov 2024)
- "Defending Large Language Models Against Attacks With Residual Stream Activation Analysis" (Kawasaki, Davis, and Abbas, Nov 2024)
- "LProtector: An LLM-driven Vulnerability Detection System" (Sheng et al., Nov 2024)
- "SECURE: Benchmarking Large Language Models for Cybersecurity" (Bhusal et al., Oct 2024)
- "Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis" (Brokman et al., Oct 2024)
- "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (Li et al., Oct 2024)
- "AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples" (Cinà et al., Oct 2024)
- "Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models" (Loumachi and Ghanem, Sep 2024)
- "Real-world Adversarial Defense against Patch Attacks based on Diffusion Model" (Wei et al., Sep 2024)
- "LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems" (Otal and Canbaz, Sep 2024)
- "SECURE: Benchmarking Large Language Models for Cybersecurity Advisory" (Bhusal et al., Sep 2024)
- "Continual Adversarial Defense" (Wang et al., Aug 2024)
- "Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection" (Song et al., Aug 2024)
- "Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey" (Truong, Dang, and Le, Aug 2024)
- "Blockchain for Large Language Model Security and Safety: A Holistic Survey" (Geren et al., Jul 2024)
Training Data Poisoning and Simulated Publication of Poisoned Public Datasets
Covers:
- OWASP LLM 03: Training Data Poisoning
- OWASP ML 02: Data Poisoning Attack
- MITRE ATLAS Resource Development
We moved this subsection from 'Threats Using AI Models' to this section as poisoned data is a threat to AI.
Research:
- "Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws" (Bowen et al., Oct 2024)
- "Inverting Gradient Attacks Naturally Makes Data Poisons: An Availability Attack on Neural Networks" (Bouaziz, Mhamdi, and Usunier, Oct 2024)
- "Certified Robustness to Data Poisoning in Gradient-Based Training" (Sosnin et al., Oct 2024)
- "Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning" (Zhang et al., Oct 2024)
- "Securing Voice Authentication Applications Against Targeted Data Poisoning" (Mohammadi et al., Oct 2024)
- "Data Poisoning-based Backdoor Attack Framework against Supervised Learning Rules of Spiking Neural Networks" (Jin et al., Sep 2024)
- "Hiding Backdoors within Event Sequence Data via Poisoning Attacks" (Ermilova et al., Aug 2024)
- "ConfusedPilot: Confused Deputy Risks in RAG-based LLMs" (RoyChowdhury et al., Aug 2024)
- "PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models" (Zou et al., Aug 2024)
- "Scaling Laws for Data Poisoning in LLMs" (Bowen et al., Aug 2024)
- "Threats, Attacks, and Defenses in Machine Unlearning: A Survey" (Liu et al., Aug 2024)
- "Debiased Graph Poisoning Attack via Contrastive Surrogate Objective" (Yoon et al., Jul 2024)
Model (Mis)Interpretability
Added this subsection to cover cybersecurity issues that arise from interpretability issues.
Research:
Model Collapse
Covers:
- OWASP LLM 03: Training Data Poisoning
- OWASP ML 02: Data Poisoning Attack
Research:
Model Denial of Service and Chaff Data Spamming
Covers:
- OWASP LLM 04: Model Denial of Service
- MITRE ATLAS Impact
Research:
- "Impact of White-Box Adversarial Attacks on Convolutional Neural Networks" (Podder and Ghosh, Oct 2024)
- "DrLLM: Prompt-Enhanced Distributed Denial-of-Service Resistance Method with Large Language Models" (Yin, Liu, and Xu, Sep 2024)
- "Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework" (Pisano et al., Aug 2024)
- "Self-Evaluation as a Defense Against Adversarial Attacks on LLMs" (Brown et al., Aug 2024)
- "Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks" (Diao et al., Jul 2024)
- "Enhancing Adversarial Text Attacks on BERT Models with Projected Gradient Descent" (Waghela, Sen, and Rakshit, Jul 2024)
Model Modifications
We added this subsection to include security issues that arise from post-hoc model modifications such as fine-tuning, quantization.
Research:
- "Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey" (Huang et al., Oct 2024)
- "Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks" (Poppi et al., Oct 2024)
- "Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey" (Huang et al., Oct 2024)
- "Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack" (Chen et al., Sep 2024)
- "The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs" (Chen et al., Sep 2024)
- "BadMerging: Backdoor Attacks Against Model Merging" (Zhang et al., Sep 2024)
- "Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning" (Huang et al., Sep 2024)
- "RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model" (Shi et al., Sep 2024)
- "Large Language Models as Carriers of Hidden Messages" (Hoscilowicz et al., Aug 2024)
- "Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence" (Chen et al., Aug 2024)
- "Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning" (Huang et al., Aug 2024)
- "Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data" (Baumgärtner et al., Aug 2024)
- "Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes" (Kumar et al., Jul 2024)
- "DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers" (Martínez-Mejía et al., Jul 2024)
- "Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges" (Sayyed et al., Jul 2024)
Inadequate AI Alignment
Discover ML Model Family and Ontology/Model Extraction
Added model extraction to this as it did not have its own category.
Covers:
Research:
Improper Error Handling
Robust Multi-Prompt and Multi-Model Attacks
Research:
LLM Data Leakage and ML Artifact Collection
- MITRE ATLAS Exfiltration & Collection
Research:
- "Towards More Realistic Extraction Attacks: An Adversarial Perspective" (More, Ganesh, and Farnadi, Nov 2024)
- "Stealing User Prompts from Mixture of Experts" (Yona et al., Oct 2024)
- "Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses" (Glukhov et al., Oct 2024)
- "CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants" (Finkman Noah et al., Oct 2024)
- "Towards a Theoretical Understanding of Memorization in Diffusion Models" (Chen et al., Oct 2024)
- "Extracting Memorized Training Data via Decomposition" (Su et al., Sep 2024)
- "Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships" (Gyo Oh et al., Sep 2024)
- "Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens" (Zhang and Wu, Jul 2024)
Evade ML Model
Covers:
- MITRE ATLAS Defense Evasion & Impact
Research:
Model Theft, Data Leakage, ML-Enabled Product or Service, and API Access
Covers:
- OWASP LLM 10: Model Theft
- OWASP ML 05: Model Theft
- MITRE ATLAS Exfiltration and ML Model Access
Research:
- "Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity" (Zhu et al., Aug 2024)
Model Inversion Attack
Covers:
- OWASP ML 03: Model Inversion Attack
- MITRE ATLAS Exfiltration
Research:
Exfiltration via Cyber Means
Covers:
Model Skewing Attack
Covers:
- OWASP ML 08: Model Skewing
Evade ML Model
Covers:
- MITRE ATLAS Initial Access
MITRE ATLAS Reconnaissance
Discover ML Artifacts, Data from Information Repositories and Local System, and Acquire Public ML Artifacts
Covers:
- MITRE ATLAS Resource Development, Discovery, and Collection
User Execution, Command and Scripting Interpreter
Covers:
Physical Model Access and Full Model Access
Covers:
- MITRE ATLAS ML Model Access
Valid Accounts
- MITRE ATLAS Initial Access
Exploit Public Facing Application
- MITRE ATLAS Initial Access
Threats from AI Model
Misinformation
Covers:
Over Reliance on LLM Outputs and External (Social) Harms
Covers:
- OWASP LLM 09: Overreliance
- MITRE ATLAS Impact
Research:
Fake Resources and Phishing
Covers:
- MITRE ATLAS Initial Access
Research:
- "From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks" (Kulkarni et al., Jul 2024)
Social Manipulation
Research:
- "PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety" (Zhang et al., Aug 2024)
Deep Fakes
Research:
- "SoK: On the Role and Future of AIGC Watermarking in the Era of Gen-AI" (Ren et al., Nov 2024)
- "Conceptwm: A Diffusion Model Watermark for Concept Protection" (Lei et al., Nov 2024)
- "Watermark-based Detection and Attribution of AI-Generated Content" (Jiang et al., Nov 2024)
- "An undetectable watermark for generative image models" (Gunn, Zhao, and Song, Nov 2024)
- "InvisMark: Invisible and Robust Watermarking for AI-generated Image Provenance" (Xu et al., Nov 2024)
- "FoldMark: Protecting Protein Generative Models with Watermarking" (Zhang et al., Nov 2024)
- "Invisible Image Watermarks Are Provably Removable Using Generative AI" (Zhao et al., Oct 2024)
- "Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection" (Yang, Peng, and Xia, Oct 2024)
- "Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature" (Zhou et al., Oct 2024)
- "Watermarking Large Language Models and the Generated Content: Opportunities and Challenges" (Zhang and Koushanfar, Oct 2024)
- "An undetectable watermark for generative image models" (Gunn, Zhao, and Song, Oct 2024)
- "Deepfake detection in videos with multiple faces using geometric-fakeness features" (Vyshegorodtsev et al., Oct 2024)
- "Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice" (He et al., Oct 2024)
- "Discovering Clues of Spoofed LM Watermarks" (Gloaguen et al., Oct 2024)
- "Multi-Designated Detector Watermarking for Language Models" (Huang et al., Oct 2024)
- "Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection" (PN et al., Oct 2024)
- "Signal Watermark on Large Language Models" (Zu and Sheng, Oct 2024)
- "Diffuse or Confuse: A Diffusion Deepfake Speech Dataset" (Firc, Malinka, and Hanáček, Oct 2024)
- "A Watermark for Black-Box Language Models" (Bahri et al., Oct 2024)
- "Optimizing Adaptive Attacks against Content Watermarks for Language Models" (Diaa, Aremu, and Lukas, Oct 2024)
- "Social Media Authentication and Combating Deepfakes using Semi-fragile Invisible Image Watermarking" (Nadimpalli and Rattani, Oct 2024)
- "PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response" (Mittal et al., Oct 2024)
- "Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes" (Xie and Luo, Sep 2024)
- "XAI-Based Detection of Adversarial Attacks on Deepfake Detectors" (Pinhasov et al., Aug 2024)
Shallow Fakes
Misidentification
Private Information Used in Training
Research:
- "Privacy-hardened and hallucination-resistant synthetic data generation with logic-solvers" (Burgess et al., Oct 2024)
- "Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data" (Akkus et al., Sep 2024)
- "Catch Me if You Can: Detecting Unauthorized Data Use in Deep Learning Models" (Chen and Pattabiraman, Sep 2024)
- "Ethical Challenges in Computer Vision: Ensuring Privacy and Mitigating Bias in Publicly Available Datasets" (Tahir, Aug 2024)
- "Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions" (Liu and Yang., Aug 2024)
Unsecured Credentials
Covers:
- MITRE ATLAS Credential Access
AI-Generated/Augmented Exploits
Added this category to cover instances where generative AI systems are used to generate cybersecurity exploits.
Research: