Blog

AI Safety+Cybersecurity R&D Tracker

September 12, 2024
AI Safety+Cybersecurity R&D Tracker

Subscribe to our monthly AI Safety and Cybersecurity R&D Tracker updates!

Threats using AI models

Prompt Injection and Input Manipulation (Direct and Indirect)

Covers:

  • OWASP LLM 01: Prompt Injection
  • OWASP ML 01: Input Manipulation Attack
  • MITRE ATLAS Initial Access, Privilege Escalation, and Defense Evasion 

Research:

System and Meta Prompt Extraction

Covers:

  • MITRE ATLAS Discovery and Exfiltration

Research:

  • "Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models" (Liang et al., Aug 2024)
  • "Prompt Leakage effect and defense strategies for multi-turn LLM interactions" (Agarwal et al., Jul 2024)

Obtain and Develop (Software) Capabilities, Acquire Infrastructure, or Establish Accounts

Covers:

  • MITRE ATLAS Resource Development

Research:

  • "Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search" (Moss, Aug 2024)

Jailbreak, Cost Harvesting, or Erode ML Model Integrity

Covers:

  • MITRE ATLAS Privilege Escalation, Defense Evasion, and Impact 

Research:

Proxy AI ML Model (Simulations)

Covers:

  • MITRE ATLAS ML Attack Staging

Verify Attack (Efficacy)

Covers:

  • MITRE ATLAS ML Attack Staging

Insecure Output Handling

Covers:

  • OWASP LLM 02: Insecure Output Handling

Research:

Sensitive Information Disclosure

Covers:

  • OWASP LLM 06: Sensitive Information Disclosure

Research:

  • "LLM-PBE: Assessing Data Privacy in Large Language Models" (Li et al., Sep 2024)
  • "PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action" (Shao et al., Sep 2024)
  • "Privacy-preserving Universal Adversarial Defense for Black-box Models" (Li et al., Aug 2024)
  • "DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts" (Sun et al., Aug 2024)
  • "Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models" (Chong et al., Aug 2024)

Insecure Plugin Design and Plugin Compromise

Covers:

  • OWASP LLM 07: Insecure Plugin Design
  • MITRE ATLAS Execution & Privilege Escalation

Research:

Hallucination Squatting and Phishing

Covers:

  • MITRE ATLAS Initial Access

Persistence

Covers:

  • MITRE ATLAS Persistence

Backdoor ML Model and Craft Adversarial Data

Covers:

  • MITRE ATLAS ML Attack Staging

Research:

Supply Chain Vulnerabilities and Compromise

Covers:

  • OWASP LLM 05/OWASP ML 06/MITRE ATLAS Initial Access

Excessive Agency/Agentic Manipulation/Agentic Systems

We added 'agentic' manipulation to this subcategory.

Covers:

  • OWASP LLM 08: Excessive Agency

Research:

Copyright Infringement

Covers:

  • MITRE ATLAS Impact

Research:

Threat to AI Models

General Approaches

We added this subsection to cover research that broadly looks at AI security.

Research:

Training Data Poisoning and Simulated Publication of Poisoned Public Datasets

Covers:

  • OWASP LLM 03: Training Data Poisoning
  • OWASP ML 02: Data Poisoning Attack
  • MITRE ATLAS Resource Development 

We moved this subsection from 'Threats Using AI Models' to this section as poisoned data is a threat to AI.

Research:

Model (Mis)Interpretability

Added this subsection to cover cybersecurity issues that arise from interpretability issues.

Research:

Model Collapse

Covers:

  • OWASP LLM 03: Training Data Poisoning
  • OWASP ML 02: Data Poisoning Attack

Research:

Model Denial of Service and Chaff Data Spamming

Covers:

  • OWASP LLM 04: Model Denial of Service
  • MITRE ATLAS Impact

Research:

Model Modifications

We added this subsection to include security issues that arise from post-hoc model modifications such as fine-tuning, quantization.

Research:

Inadequate AI Alignment

Discover ML Model Family and Ontology/Model Extraction

Added model extraction to this as it did not have its own category.

Covers:

  • MITRE ATLAS Discovery

Research:

Improper Error Handling

Robust Multi-Prompt and Multi-Model Attacks

Research:

LLM Data Leakage and ML Artifact Collection

  • MITRE ATLAS Exfiltration & Collection

Research:

Evade ML Model

Covers:

  • MITRE ATLAS Defense Evasion & Impact

Model Theft, Data Leakage, ML-Enabled Product or Service, and API Access

Covers:

  • OWASP LLM 10: Model Theft
  • OWASP ML 05: Model Theft
  • MITRE ATLAS Exfiltration and ML Model Access

Research:

  • "Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity" (Zhu et al., Aug 2024)

Model Inversion Attack

Covers:

  • OWASP ML 03: Model Inversion Attack
  • MITRE ATLAS Exfiltration

Research:

Exfiltration via Cyber Means

Covers:

  • MITRE ATLAS Exfiltration

Model Skewing Attack

Covers:

  • OWASP ML 08: Model Skewing

Evade ML Model

Covers:

  • MITRE ATLAS Initial Access

MITRE ATLAS Reconnaissance

Discover ML Artifacts, Data from Information Repositories and Local System, and Acquire Public ML Artifacts

Covers:

  • MITRE ATLAS Resource Development, Discovery, and Collection

User Execution, Command and Scripting Interpreter

Covers:

  • MITRE ATLAS Execution 

Physical Model Access and Full Model Access

Covers:

  • MITRE ATLAS ML Model Access

Valid Accounts

  • MITRE ATLAS Initial Access

Exploit Public Facing Application

  • MITRE ATLAS Initial Access

Threats from AI Model

Misinformation

Covers:

  • MITRE ATLAS Impact

Over Reliance on LLM Outputs and External (Social) Harms

Covers:

  • OWASP LLM 09: Overreliance
  • MITRE ATLAS Impact

Research:

Fake Resources and Phishing

Covers:

  • MITRE ATLAS Initial Access

Research:

  • "From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks" (Kulkarni et al., Jul 2024)

Social Manipulation

Research:

  • "PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety" (Zhang et al., Aug 2024)

Deep Fakes

Research:

Shallow Fakes

Misidentification

Private Information Used in Training

Research:

Unsecured Credentials

Covers:

  • MITRE ATLAS Credential Access

AI-Generated/Augmented Exploits

Added this category to cover instances where generative AI systems are used to generate cybersecurity exploits.

Research:

You may be interested in

Want to get started with safe & compliant AI adoption?

Schedule a call with one of our experts to see how Fairly can help