Subscribe to our monthly AI Safety and Cybersecurity R&D Tracker updates!
Threats using AI models
Prompt Injection and Input Manipulation (Direct and Indirect)
Covers:
- OWASP LLM 01: Prompt Injection
- OWASP ML 01: Input Manipulation Attack
- MITRE ATLAS Initial Access, Privilege Escalation, and Defense Evasion
Research:
System and Meta Prompt Extraction
Covers:
- MITRE ATLAS Discovery and Exfiltration
Research:
Obtain and Develop (Software) Capabilities, Acquire Infrastructure, or Establish Accounts
Covers:
- MITRE ATLAS Resource Development
Research:
Jailbreak, Cost Harvesting, or Erode ML Model Integrity
Covers:
- MITRE ATLAS Privilege Escalation, Defense Evasion, and Impact
Research:
Proxy AI ML Model (Simulations)
Covers:
- MITRE ATLAS ML Attack Staging
Research:
Verify Attack (Efficacy)
Covers:
- MITRE ATLAS ML Attack Staging
Research:
Insecure Output Handling
Covers:
- OWASP LLM 02: Insecure Output Handling
Research:
Sensitive Information Disclosure
Covers:
- OWASP LLM 06: Sensitive Information Disclosure
Research:
Insecure Plugin Design and Plugin Compromise
Covers:
- OWASP LLM 07: Insecure Plugin Design
- MITRE ATLAS Execution & Privilege Escalation
Research:
Hallucination Squatting and Phishing
Covers:
- MITRE ATLAS Initial Access
Research:
Persistence
Covers:
Research:
Backdoor ML Model and Craft Adversarial Data
Covers:
- MITRE ATLAS ML Attack Staging
Research:
Supply Chain Vulnerabilities and Compromise
Covers:
- OWASP LLM 05/OWASP ML 06/MITRE ATLAS Initial Access
Research:
Excessive Agency, Agentic Manipulation, Agentic Systems
We added 'agentic' manipulation to this subcategory.
Covers:
- OWASP LLM 08: Excessive Agency
Research:
Copyright Infringement
Covers:
Research:
Defenses
Research:
Threat to AI Models
General Approaches
We added this subsection to cover research that broadly looks at AI security.
Research:
Data Poisoning and Simulated Publication of Poisoned Public Datasets
Covers:
- OWASP LLM 03: Training Data Poisoning
- OWASP ML 02: Data Poisoning Attack
- MITRE ATLAS Resource Development
We moved this subsection from 'Threats Using AI Models' to this section as poisoned data is a threat to AI.
Research:
Model (Mis)Interpretability
Added this subsection to cover cybersecurity issues that arise from interpretability issues.
Research:
Model Collapse
Covers:
- OWASP LLM 03: Training Data Poisoning
- OWASP ML 02: Data Poisoning Attack
Research:
Model Denial of Service and Chaff Data Spamming
Covers:
- OWASP LLM 04: Model Denial of Service
- MITRE ATLAS Impact
Research:
Model Modifications
We added this subsection to include security issues that arise from post-hoc model modifications such as fine-tuning, quantization.
Research:
Inadequate AI Alignment
Research:
Discover ML Model Family and Ontology/Model Extraction
Added model extraction to this as it did not have its own category.
Covers:
Research:
Improper Error Handling
Research:
Robust Multi-Prompt and Multi-Model Attacks
Research:
Multi-Modal Attacks
Research:
LLM Data Leakage and ML Artifact Collection
- MITRE ATLAS Exfiltration & Collection
Research:
Evade ML Model
Covers:
- MITRE ATLAS Defense Evasion & Impact
Research:
Model Theft, Data Leakage, ML-Enabled Product or Service, and API Access
Covers:
- OWASP LLM 10: Model Theft
- OWASP ML 05: Model Theft
- MITRE ATLAS Exfiltration and ML Model Access
Research:
Model Inversion Attack
Covers:
- OWASP ML 03: Model Inversion Attack
- MITRE ATLAS Exfiltration
Research:
Exfiltration via Cyber Means
Covers:
Research:
Model Skewing Attack
Covers:
- OWASP ML 08: Model Skewing
Research:
Evade ML Model
Covers:
- MITRE ATLAS Initial Access
- MITRE ATLAS Reconnaissance
Research:
Discover ML Artifacts, Data from Information Repositories and Local System, and Acquire Public ML Artifacts
Covers:
- MITRE ATLAS Resource Development, Discovery, and Collection
Research:
User Execution, Command and Scripting Interpreter
Covers:
Research:
Physical Model Access and Full Model Access
Covers:
- MITRE ATLAS ML Model Access
Research:
Valid Accounts
Covers:
- MITRE ATLAS Initial Access
Research:
Exploit Public Facing Application
Covers:
- MITRE ATLAS Initial Access
Research:
Threats from AI Model
Misinformation
Covers:
Research:
Over Reliance on LLM Outputs and External (Social) Harms
Covers:
- OWASP LLM 09: Overreliance
- MITRE ATLAS Impact
Research:
Fake Resources and Phishing
Covers:
- MITRE ATLAS Initial Access
Research:
Social Manipulation
Research:
Deep Fakes, Content Provenance, and Watermarking
Research:
Shallow Fakes
Research:
Misidentification
Research:
Private Information Used in Training
Research:
Unsecured Credentials
Covers:
- MITRE ATLAS Credential Access
Research:
AI-Generated/Augmented Exploits
Added this category to cover instances where generative AI systems are used to generate cybersecurity exploits.
Research: