Trackers

AI Safety+Cybersecurity R&D Tracker

September 26, 2024
AI Safety+Cybersecurity R&D Tracker

Subscribe to our monthly AI Safety and Cybersecurity R&D Tracker updates!

Threats using AI models

Prompt Injection and Input Manipulation (Direct and Indirect)

Covers:

  • OWASP LLM 01: Prompt Injection
  • OWASP ML 01: Input Manipulation Attack
  • MITRE ATLAS Initial Access, Privilege Escalation, and Defense Evasion 

Research:

System and Meta Prompt Extraction

Covers:

  • MITRE ATLAS Discovery and Exfiltration

Research:

Obtain and Develop (Software) Capabilities, Acquire Infrastructure, or Establish Accounts

Covers:

  • MITRE ATLAS Resource Development

Research:

  • "Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search" (Moss, Aug 2024)

Jailbreak, Cost Harvesting, or Erode ML Model Integrity

Covers:

  • MITRE ATLAS Privilege Escalation, Defense Evasion, and Impact 

Research:

Proxy AI ML Model (Simulations)

Covers:

  • MITRE ATLAS ML Attack Staging

Verify Attack (Efficacy)

Covers:

  • MITRE ATLAS ML Attack Staging

Insecure Output Handling

Covers:

  • OWASP LLM 02: Insecure Output Handling

Research:

Sensitive Information Disclosure

Covers:

  • OWASP LLM 06: Sensitive Information Disclosure

Research:

  • "Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective" (Luo et al., Sep 2024)
  • "LLM-PBE: Assessing Data Privacy in Large Language Models" (Li et al., Sep 2024)
  • "PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action" (Shao et al., Sep 2024)
  • "Privacy-preserving Universal Adversarial Defense for Black-box Models" (Li et al., Aug 2024)
  • "DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts" (Sun et al., Aug 2024)
  • "Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models" (Chong et al., Aug 2024)

Insecure Plugin Design and Plugin Compromise

Covers:

  • OWASP LLM 07: Insecure Plugin Design
  • MITRE ATLAS Execution & Privilege Escalation

Research:

Hallucination Squatting and Phishing

Covers:

  • MITRE ATLAS Initial Access

Research:

  • "DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection" (Chiba et al., Oct 2024)
  • "We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs" (Spracklen et al., Sep 2024)

Persistence

Covers:

  • MITRE ATLAS Persistence

Backdoor ML Model and Craft Adversarial Data

Covers:

  • MITRE ATLAS ML Attack Staging

Research:

Supply Chain Vulnerabilities and Compromise

Covers:

  • OWASP LLM 05/OWASP ML 06/MITRE ATLAS Initial Access

Excessive Agency/Agentic Manipulation/Agentic Systems

We added 'agentic' manipulation to this subcategory.

Covers:

  • OWASP LLM 08: Excessive Agency

Research:

Copyright Infringement

Covers:

  • MITRE ATLAS Impact

Research:

  • "SoK: Dataset Copyright Auditing in Machine Learning Systems" (Du et al., Oct 2024)
  • "Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Watermarking" (Kong et al., Sep 2024)
  • "Strong Copyright Protection for Language Models via Adaptive Model Fusion" (Wang et al., Jul 2024)

Threat to AI Models

General Approaches

We added this subsection to cover research that broadly looks at AI security.

Research:

Training Data Poisoning and Simulated Publication of Poisoned Public Datasets

Covers:

  • OWASP LLM 03: Training Data Poisoning
  • OWASP ML 02: Data Poisoning Attack
  • MITRE ATLAS Resource Development 

We moved this subsection from 'Threats Using AI Models' to this section as poisoned data is a threat to AI.

Research:

Model (Mis)Interpretability

Added this subsection to cover cybersecurity issues that arise from interpretability issues.

Research:

Model Collapse

Covers:

  • OWASP LLM 03: Training Data Poisoning
  • OWASP ML 02: Data Poisoning Attack

Research:

Model Denial of Service and Chaff Data Spamming

Covers:

  • OWASP LLM 04: Model Denial of Service
  • MITRE ATLAS Impact

Research:

Model Modifications

We added this subsection to include security issues that arise from post-hoc model modifications such as fine-tuning, quantization.

Research:

Inadequate AI Alignment

Discover ML Model Family and Ontology/Model Extraction

Added model extraction to this as it did not have its own category.

Covers:

  • MITRE ATLAS Discovery

Research:

Improper Error Handling

Robust Multi-Prompt and Multi-Model Attacks

Research:

LLM Data Leakage and ML Artifact Collection

  • MITRE ATLAS Exfiltration & Collection

Research:

Evade ML Model

Covers:

  • MITRE ATLAS Defense Evasion & Impact

Research:

Model Theft, Data Leakage, ML-Enabled Product or Service, and API Access

Covers:

  • OWASP LLM 10: Model Theft
  • OWASP ML 05: Model Theft
  • MITRE ATLAS Exfiltration and ML Model Access

Research:

  • "Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity" (Zhu et al., Aug 2024)

Model Inversion Attack

Covers:

  • OWASP ML 03: Model Inversion Attack
  • MITRE ATLAS Exfiltration

Research:

Exfiltration via Cyber Means

Covers:

  • MITRE ATLAS Exfiltration

Model Skewing Attack

Covers:

  • OWASP ML 08: Model Skewing

Evade ML Model

Covers:

  • MITRE ATLAS Initial Access

MITRE ATLAS Reconnaissance

Discover ML Artifacts, Data from Information Repositories and Local System, and Acquire Public ML Artifacts

Covers:

  • MITRE ATLAS Resource Development, Discovery, and Collection

User Execution, Command and Scripting Interpreter

Covers:

  • MITRE ATLAS Execution 

Physical Model Access and Full Model Access

Covers:

  • MITRE ATLAS ML Model Access

Valid Accounts

  • MITRE ATLAS Initial Access

Exploit Public Facing Application

  • MITRE ATLAS Initial Access

Threats from AI Model

Misinformation

Covers:

  • MITRE ATLAS Impact

Over Reliance on LLM Outputs and External (Social) Harms

Covers:

  • OWASP LLM 09: Overreliance
  • MITRE ATLAS Impact

Research:

Fake Resources and Phishing

Covers:

  • MITRE ATLAS Initial Access

Research:

  • "From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks" (Kulkarni et al., Jul 2024)

Social Manipulation

Research:

  • "PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety" (Zhang et al., Aug 2024)

Deep Fakes

Research:

Shallow Fakes

Misidentification

Private Information Used in Training

Research:

  • "Privacy-hardened and hallucination-resistant synthetic data generation with logic-solvers" (Burgess et al., Oct 2024)
  • "Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data" (Akkus et al., Sep 2024)
  • "Catch Me if You Can: Detecting Unauthorized Data Use in Deep Learning Models" (Chen and Pattabiraman, Sep 2024)
  • "Ethical Challenges in Computer Vision: Ensuring Privacy and Mitigating Bias in Publicly Available Datasets" (Tahir, Aug 2024)
  • "Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions" (Liu and Yang., Aug 2024)

Unsecured Credentials

Covers:

  • MITRE ATLAS Credential Access

AI-Generated/Augmented Exploits

Added this category to cover instances where generative AI systems are used to generate cybersecurity exploits.

Research:

You may be interested in

Want to get started with safe & compliant AI adoption?

Schedule a call with one of our experts to see how Fairly AI can help