AI Safety+Cybersecurity R&D Tracker

July 11, 2025

‍

Subscribe to our monthly AI Safety and Cybersecurity R&D Tracker updates!

Threats using AI models

Prompt Injection and Input Manipulation (Direct and Indirect)

Covers:

OWASP LLM 01: Prompt Injection
OWASP ML 01: Input Manipulation Attack
MITRE ATLAS Initial Access, Privilege Escalation, and Defense Evasion

Research:

‍

System and Meta Prompt Extraction

Covers:

MITRE ATLAS Discovery and Exfiltration

Research:

‍

Obtain and Develop (Software) Capabilities, Acquire Infrastructure, or Establish Accounts

Covers:

MITRE ATLAS Resource Development

Research:

‍

Jailbreak, Cost Harvesting, or Erode ML Model Integrity

Covers:

MITRE ATLAS Privilege Escalation, Defense Evasion, and Impact

Research:

‍

Proxy AI ML Model (Simulations)

Covers:

MITRE ATLAS ML Attack Staging

Research:

‍

Verify Attack (Efficacy)

Covers:

MITRE ATLAS ML Attack Staging

Research:

‍

Insecure Output Handling

Covers:

OWASP LLM 02: Insecure Output Handling

Research:

‍

Sensitive Information Disclosure

Covers:

OWASP LLM 06: Sensitive Information Disclosure

Research:

‍

Insecure Plugin Design and Plugin Compromise

Covers:

OWASP LLM 07: Insecure Plugin Design
MITRE ATLAS Execution & Privilege Escalation

Research:

‍

Hallucination Squatting and Phishing

Covers:

MITRE ATLAS Initial Access

Research:

‍

Persistence

Covers:

MITRE ATLAS Persistence

Research:

‍

Backdoor ML Model and Craft Adversarial Data

Covers:

MITRE ATLAS ML Attack Staging

Research:

‍

Supply Chain Vulnerabilities and Compromise

Covers:

OWASP LLM 05/OWASP ML 06/MITRE ATLAS Initial Access

Research:

‍

Excessive Agency, Agentic Manipulation, Agentic Systems

We added 'agentic' manipulation to this subcategory.

Covers:

OWASP LLM 08: Excessive Agency

Research:

‍

Copyright Infringement

Covers:

MITRE ATLAS Impact

Research:

‍

Defenses

Research:

‍

Threat to AI Models

General Approaches

We added this subsection to cover research that broadly looks at AI security.

Research:

‍

Data Poisoning and Simulated Publication of Poisoned Public Datasets

Covers:

OWASP LLM 03: Training Data Poisoning
OWASP ML 02: Data Poisoning Attack
MITRE ATLAS Resource Development

We moved this subsection from 'Threats Using AI Models' to this section as poisoned data is a threat to AI.

Research:

‍

Model (Mis)Interpretability

Added this subsection to cover cybersecurity issues that arise from interpretability issues.

Research:

‍

Model Collapse

Covers:

OWASP LLM 03: Training Data Poisoning
OWASP ML 02: Data Poisoning Attack

Research:

‍

Model Denial of Service and Chaff Data Spamming

Covers:

OWASP LLM 04: Model Denial of Service
MITRE ATLAS Impact

Research:

‍

Model Modifications

We added this subsection to include security issues that arise from post-hoc model modifications such as fine-tuning, quantization.

Research:

‍

Inadequate AI Alignment

Research:

‍

Discover ML Model Family and Ontology/Model Extraction

Added model extraction to this as it did not have its own category.

Covers:

MITRE ATLAS Discovery

Research:

‍

Improper Error Handling

Research:

‍

Robust Multi-Prompt and Multi-Model Attacks

Research:

‍

Multi-Modal Attacks

Research:

‍

LLM Data Leakage and ML Artifact Collection

MITRE ATLAS Exfiltration & Collection

Research:

‍

Evade ML Model

Covers:

MITRE ATLAS Defense Evasion & Impact

Research:

‍

Model Theft, Data Leakage, ML-Enabled Product or Service, and API Access

Covers:

OWASP LLM 10: Model Theft
OWASP ML 05: Model Theft
MITRE ATLAS Exfiltration and ML Model Access

Research:

‍

Model Inversion Attack

Covers:

OWASP ML 03: Model Inversion Attack
MITRE ATLAS Exfiltration

Research:

‍

Exfiltration via Cyber Means

Covers:

MITRE ATLAS Exfiltration

Research:

‍

Model Skewing Attack

Covers:

OWASP ML 08: Model Skewing

Research:

‍

Evade ML Model

Covers:

MITRE ATLAS Initial Access
MITRE ATLAS Reconnaissance

Research:

‍

Discover ML Artifacts, Data from Information Repositories and Local System, and Acquire Public ML Artifacts

Covers:

MITRE ATLAS Resource Development, Discovery, and Collection

Research:

‍

User Execution, Command and Scripting Interpreter

Covers:

MITRE ATLAS Execution

Research:

‍

Physical Model Access and Full Model Access

Covers:

MITRE ATLAS ML Model Access

Research:

‍

Valid Accounts

Covers:

MITRE ATLAS Initial Access

Research:

‍

Exploit Public Facing Application

Covers:

MITRE ATLAS Initial Access

Research:

‍

Threats from AI Model

Misinformation

Covers:

MITRE ATLAS Impact

Research:

‍

Over Reliance on LLM Outputs and External (Social) Harms

Covers:

OWASP LLM 09: Overreliance
MITRE ATLAS Impact

Research:

‍

Fake Resources and Phishing

Covers:

MITRE ATLAS Initial Access

Research:

‍

Social Manipulation

Research:

‍

Deep Fakes, Content Provenance, and Watermarking

Research:

‍

Shallow Fakes

Research:

‍

Misidentification

Research:

‍

Private Information Used in Training

Research:

‍

Unsecured Credentials

Covers:

MITRE ATLAS Credential Access

Research:

‍

AI-Generated/Augmented Exploits

Added this category to cover instances where generative AI systems are used to generate cybersecurity exploits.

Research:

‍

AI Safety+Cybersecurity R&D Tracker

Threats using AI models

Prompt Injection and Input Manipulation (Direct and Indirect)

System and Meta Prompt Extraction

Obtain and Develop (Software) Capabilities, Acquire Infrastructure, or Establish Accounts

Jailbreak, Cost Harvesting, or Erode ML Model Integrity

Proxy AI ML Model (Simulations)

Verify Attack (Efficacy)

Insecure Output Handling

Sensitive Information Disclosure

Insecure Plugin Design and Plugin Compromise

Hallucination Squatting and Phishing

Persistence

Backdoor ML Model and Craft Adversarial Data

Supply Chain Vulnerabilities and Compromise

Excessive Agency, Agentic Manipulation, Agentic Systems

Copyright Infringement

Defenses

Threat to AI Models

General Approaches

Data Poisoning and Simulated Publication of Poisoned Public Datasets

Model (Mis)Interpretability

Model Collapse

Model Denial of Service and Chaff Data Spamming

Model Modifications

Inadequate AI Alignment

Discover ML Model Family and Ontology/Model Extraction

Improper Error Handling

Robust Multi-Prompt and Multi-Model Attacks

Multi-Modal Attacks

LLM Data Leakage and ML Artifact Collection

Evade ML Model

Model Theft, Data Leakage, ML-Enabled Product or Service, and API Access

Model Inversion Attack

Exfiltration via Cyber Means

Model Skewing Attack

Evade ML Model

Discover ML Artifacts, Data from Information Repositories and Local System, and Acquire Public ML Artifacts

User Execution, Command and Scripting Interpreter

Physical Model Access and Full Model Access

Valid Accounts

Exploit Public Facing Application

Threats from AI Model

Misinformation

Over Reliance on LLM Outputs and External (Social) Harms

Fake Resources and Phishing

Social Manipulation

Deep Fakes, Content Provenance, and Watermarking

Shallow Fakes

Misidentification

Private Information Used in Training

Unsecured Credentials

AI-Generated/Augmented Exploits

You may be interested in

AI Safety+Cybersecurity R&D Tracker

AI Framework Tracker

Global AI Regulation Tracker

Want to get started with safe & compliant AI adoption?