Benchmark Coverage Comparison¶
Auto-generated comparison of ZIRAN's attack vector library against published AI agent security benchmarks.
Last updated: 2026-04-22
Executive Summary¶
- 639 attack vectors across 11 attack categories
- 100.0% OWASP LLM Top 10 coverage (10/10 categories)
- 10 multi-turn jailbreak tactics, 12 encoding types
- 229 multi-turn vectors
- 11 harm categories (AgentHarm-aligned)
- Gap closure: 39.1% (9/23 gaps closed)
OWASP LLM Top 10 Coverage¶
| Code | Category | Vectors | Status |
|---|---|---|---|
| LLM01 | Prompt Injection | 468 | |
| LLM02 | Insecure Output Handling | 201 | |
| LLM03 | Training Data Poisoning | 19 | |
| LLM04 | Model Denial of Service | 14 | |
| LLM05 | Supply Chain Vulnerabilities | 18 | |
| LLM06 | Sensitive Information Disclosure | 110 | |
| LLM07 | Insecure Plugin Design | 145 | |
| LLM08 | Excessive Agency | 151 | |
| LLM09 | Overreliance | 15 | |
| LLM10 | Unbounded Consumption | 10 |
MITRE ATLAS Coverage¶
Technique mapping snapshot date: 2025-10-01 (see atlas-mapping.md for methodology).
- 72/86 ATLAS techniques represented in the library
- 14/14 agent-specific techniques covered (from the October 2025 ATLAS release)
- 639/639 vectors carry an ATLAS mapping
Coverage by Tactic¶
| Tactic | Name | Techniques | Vectors |
|---|---|---|---|
| AML.TA0000 | AI Model Access | 3/4 | 915 |
| AML.TA0001 | AI Attack Staging | 7/7 | 626 |
| AML.TA0002 | Reconnaissance | 4/6 | 25 |
| AML.TA0003 | Resource Development | 6/12 | 421 |
| AML.TA0004 | Initial Access | 9/11 | 187 |
| AML.TA0005 | Execution | 5/6 | 778 |
| AML.TA0006 | Persistence | 7/7 | 269 |
| AML.TA0007 | Defense Evasion | 6/6 | 366 |
| AML.TA0008 | Discovery | 7/7 | 173 |
| AML.TA0009 | Collection | 2/4 | 74 |
| AML.TA0010 | Exfiltration | 7/7 | 356 |
| AML.TA0011 | Impact | 14/14 | 663 |
| AML.TA0012 | Privilege Escalation | 3/3 | 563 |
| AML.TA0013 | Credential Access | 1/1 | 24 |
| AML.TA0014 | Command and Control | 1/1 | 163 |
| AML.TA0015 | Lateral Movement | 2/2 | 36 |
Benchmark Comparison¶
| Benchmark | Venue | Dimension | Target | ZIRAN | Progress | Status | Gap |
|---|---|---|---|---|---|---|---|
| AgentHarm | ICLR 2025 | Harm categories | 11 | 11 | ███████████████ 100.0% |
GAP-06 | |
| AgentHarm | ICLR 2025 | Multi-step vectors | 440 | 161 | █████░░░░░░░░░░ 36.6% |
GAP-23 | |
| InjecAgent | ACL 2024 | Indirect injection vectors | 1,054 | 63 | █░░░░░░░░░░░░░░ 6.0% |
GAP-02 | |
| AgentDojo | NeurIPS 2024 | Indirect injection vectors | 629 | 63 | ██░░░░░░░░░░░░░ 10.0% |
GAP-02 | |
| Utility measurement (baseline + post-attack) | 1 | 1 | ███████████████ 100.0% |
||||
| HarmBench | ICML 2024 | Attack tactics | 18 | 10 | ████████░░░░░░░ 55.6% |
GAP-08 | |
| Jailbreak vectors | 510 | 206 | ██████░░░░░░░░░ 40.4% |
||||
| JailbreakBench | NeurIPS 2024 | JBB categories (10) | 10 | 10 | ███████████████ 100.0% |
GAP-15 | |
| Prompt injection vectors | 100 | 206 | ███████████████ 100% |
||||
| StrongREJECT | 2024 | StrongREJECT composite formula | 1 | 1 | ███████████████ 100.0% |
GAP-04 | |
| Scoring dimensions (refusal, specificity, convincingness) | 3 | 3 | ███████████████ 100.0% |
||||
| MCPTox | 2025 | MCP vectors | 1,312 | 101 | █░░░░░░░░░░░░░░ 7.7% |
GAP-03 | |
| Agent Security Bench (ASB) | 2024 | Attack categories | 10 | 11 | ███████████████ 100% |
GAP-01 | |
| Total vectors | 400 | 639 | ███████████████ 100% |
||||
| Utility-under-attack measurement | 1 | 1 | ███████████████ 100.0% |
||||
| TensorTrust | 2024 | Prompt injection vectors | 126,000 | 206 | ░░░░░░░░░░░░░░░ 0.2% |
GAP-16 | |
| Representative pattern families | — | 11 | Distinct TensorTrust pattern families covered | ||||
| WildJailbreak | 2024 | Jailbreak tactics | 105,000 | 11 | ░░░░░░░░░░░░░░░ 0.0% |
GAP-17 | |
| WildJailbreak-inspired multi-turn vectors | — | 10 | Distinct tactic families from WildJailbreak | ||||
| LLMail-Inject / RAG Poisoning | 2024 | RAG retrieval-targeted vectors | — | 13 | Retrieval-ranked payloads across multiple document framings | GAP-13 | |
| Agent-SafetyBench | 2024 | Business impact types | 8 | 7 | █████████████░░ 87.5% |
GAP-07 | |
| BIPIA | 2024 | Indirect injection vectors | — | 63 | Multi-domain benchmark — no fixed target count | GAP-02 | |
| CyberSecEval | Meta, 2024 | Code-generation safety vectors | — | 10 | Code-gen safety + cyber knowledge elicitation families | GAP-18 | |
| Total library overlap | — | 639 | Multi-category benchmark — partial overlap | ||||
| ToolEmu | 2024 | Tool manipulation vectors | 144 | 176 | ███████████████ 100% |
GAP-19 | |
| Dedicated sandbox-evasion vectors | — | 10 | Sandbox-evasion vectors distinct from generic tool manipulation | ||||
| R-Judge | 2024 | R-Judge risk types (10) | 10 | 10 | ███████████████ 100.0% |
GAP-20 | |
| Risk scoring detectors | — | 5 | 5 detectors — different approach than interaction records | ||||
| AILuminate | MLCommons, 2025 | Resilience gap metric | 1 | 1 | ███████████████ 100.0% |
GAP-09 | |
| Baseline performance measurement | 1 | 1 | ███████████████ 100.0% |
||||
| Under-attack performance measurement | 1 | 1 | ███████████████ 100.0% |
||||
| ALERT | 2024 | ALERT micro categories (32) | 32 | 32 | ███████████████ 100.0% |
GAP-21 | |
| Harm categories | — | 11 | N/A | ||||
| MITRE ATLAS | MITRE, 2025 | ATLAS tactics covered | 16 | 16 | ███████████████ 100.0% |
GAP-22 | |
| ATLAS techniques mapped | 86 | 72 | █████████████░░ 83.7% |
||||
| Agent-specific techniques covered | 14 | 14 | ███████████████ 100.0% |
Gap Status Dashboard¶
See Gap Analysis for full details.
| ID | Gap | Priority | Issue | Status |
|---|---|---|---|---|
| GAP-01 | Benchmark harness | critical | #32 | |
| GAP-02 | Indirect prompt injection scale | critical | #33 | |
| GAP-03 | MCP tool poisoning | critical | #34 | |
| GAP-04 | Quality-aware jailbreak scoring | critical | #35 | |
| GAP-05 | Utility-under-attack measurement | important | #36 | |
| GAP-06 | Harmful multi-step task testing | important | #37 | |
| GAP-07 | Business impact categorization | important | #38 | |
| GAP-08 | Jailbreak tactic breadth | important | #39 | |
| GAP-09 | Resilience gap metric | important | #40 | |
| GAP-10 | OWASP LLM04 (Model DoS) | lower | #41 | |
| GAP-11 | OWASP LLM05 (Supply Chain) | lower | #42 | |
| GAP-12 | OWASP LLM10 (Model Theft) | lower | #43 | |
| GAP-13 | RAG-specific poisoning | lower | #44 | |
| GAP-14 | Defense evasion measurement | lower | #45 | |
| GAP-15 | JailbreakBench coverage | lower | #54 | |
| GAP-16 | TensorTrust coverage | lower | #55 | |
| GAP-17 | WildJailbreak coverage | lower | #56 | |
| GAP-18 | CyberSecEval coverage | lower | #57 | |
| GAP-19 | ToolEmu coverage | lower | #58 | |
| GAP-20 | R-Judge coverage | lower | #59 | |
| GAP-21 | ALERT coverage | lower | #60 | |
| GAP-22 | MITRE ATLAS technique mapping | important | #61 | |
| GAP-23 | AgentHarm multi-step vector scale | important | #131 |
Vector Inventory¶
By Attack Category¶
| Category | Vectors |
|---|---|
| prompt_injection | 206 |
| tool_manipulation | 176 |
| indirect_injection | 63 |
| data_exfiltration | 56 |
| privilege_escalation | 35 |
| system_prompt_extraction | 27 |
| memory_poisoning | 20 |
| authorization_bypass | 17 |
| chain_of_thought_manipulation | 15 |
| model_dos | 13 |
| multi_agent | 11 |
By Tactic¶
| Tactic | Vectors |
|---|---|
| single | 410 |
| context_buildup | 63 |
| crescendo | 38 |
| persona_shift | 23 |
| hypothetical | 17 |
| role_play | 16 |
| distraction | 15 |
| refusal_suppression | 15 |
| code_mode | 14 |
| few_shot | 14 |
| language_switch | 14 |
By Severity¶
| Severity | Vectors |
|---|---|
| critical | 367 |
| high | 201 |
| low | 1 |
| medium | 70 |
By Harm Category¶
| Harm Category | Vectors |
|---|---|
| child_exploitation | 13 |
| cybercrime | 13 |
| disinformation | 13 |
| fraud | 14 |
| harassment | 21 |
| illegal_services | 13 |
| self_harm | 14 |
| sexual_content | 14 |
| substance_abuse | 17 |
| terrorism | 14 |
| weapons | 15 |
Detection Accuracy¶
Per-detector and pipeline precision/recall/F1 over the labelled detection dataset — see detection-accuracy.md for methodology and validity caveats. Run with benchmarks/detection_accuracy.py.
| Detector | F1 |
|---|---|
| refusal | 0.90 |
| indicator | 1.00 |
| side_effect | 1.00 |
| llm_judge | 1.00 |
| pipeline | 1.00 |
Baseline over 220 labelled examples. The refusal detector's F1 is bounded by precision (false-alarms on atypically-phrased refusals), not missed compromises.
Pentest Agent vs Rule-Based Scanner¶
Head-to-head on ground-truth targets — what each tool catches vs what it costs. See pentest-evaluation.md. Run with benchmarks/pentest_vs_scanner.py.
US1 harness shipped; agent cassettes are seed-only pending the live
--mode record path. On the simulated targets the rule-based scanner reaches full ground-truth recall at ≈0 token cost; novel-discovery is only measurable on the real example agent.
Generated by benchmarks/generate_all.py on 2026-04-22.