Benchmark Coverage Comparison¶

Auto-generated comparison of ZIRAN's attack vector library against published AI agent security benchmarks.

Last updated: 2026-04-22

Executive Summary¶

639 attack vectors across 11 attack categories
100.0% OWASP LLM Top 10 coverage (10/10 categories)
10 multi-turn jailbreak tactics, 12 encoding types
229 multi-turn vectors
11 harm categories (AgentHarm-aligned)
Gap closure: 39.1% (9/23 gaps closed)

OWASP LLM Top 10 Coverage¶

Code	Category	Vectors	Status
LLM01	Prompt Injection	468	Comprehensive
LLM02	Insecure Output Handling	201	Comprehensive
LLM03	Training Data Poisoning	19	Strong
LLM04	Model Denial of Service	14	Strong
LLM05	Supply Chain Vulnerabilities	18	Strong
LLM06	Sensitive Information Disclosure	110	Comprehensive
LLM07	Insecure Plugin Design	145	Comprehensive
LLM08	Excessive Agency	151	Comprehensive
LLM09	Overreliance	15	Strong
LLM10	Unbounded Consumption	10	Strong

MITRE ATLAS Coverage¶

Technique mapping snapshot date: 2025-10-01 (see atlas-mapping.md for methodology).

72/86 ATLAS techniques represented in the library
14/14 agent-specific techniques covered (from the October 2025 ATLAS release)
639/639 vectors carry an ATLAS mapping

Coverage by Tactic¶

Tactic	Name	Techniques	Vectors
AML.TA0000	AI Model Access	3/4	915
AML.TA0001	AI Attack Staging	7/7	626
AML.TA0002	Reconnaissance	4/6	25
AML.TA0003	Resource Development	6/12	421
AML.TA0004	Initial Access	9/11	187
AML.TA0005	Execution	5/6	778
AML.TA0006	Persistence	7/7	269
AML.TA0007	Defense Evasion	6/6	366
AML.TA0008	Discovery	7/7	173
AML.TA0009	Collection	2/4	74
AML.TA0010	Exfiltration	7/7	356
AML.TA0011	Impact	14/14	663
AML.TA0012	Privilege Escalation	3/3	563
AML.TA0013	Credential Access	1/1	24
AML.TA0014	Command and Control	1/1	163
AML.TA0015	Lateral Movement	2/2	36

Benchmark Comparison¶

Benchmark	Venue	Dimension	Target	ZIRAN	Progress	Status	Gap
AgentHarm	ICLR 2025	Harm categories	11	11	`███████████████` 100.0%	closed	GAP-06
AgentHarm	ICLR 2025	Multi-step vectors	440	161	`█████░░░░░░░░░░` 36.6%	open	GAP-23
InjecAgent	ACL 2024	Indirect injection vectors	1,054	63	`█░░░░░░░░░░░░░░` 6.0%	open	GAP-02
AgentDojo	NeurIPS 2024	Indirect injection vectors	629	63	`██░░░░░░░░░░░░░` 10.0%	open	GAP-02
		Utility measurement (baseline + post-attack)	1	1	`███████████████` 100.0%
HarmBench	ICML 2024	Attack tactics	18	10	`████████░░░░░░░` 55.6%	closed	GAP-08
		Jailbreak vectors	510	206	`██████░░░░░░░░░` 40.4%
JailbreakBench	NeurIPS 2024	JBB categories (10)	10	10	`███████████████` 100.0%	closed	GAP-15
		Prompt injection vectors	100	206	`███████████████` 100%
StrongREJECT	2024	StrongREJECT composite formula	1	1	`███████████████` 100.0%	closed	GAP-04
		Scoring dimensions (refusal, specificity, convincingness)	3	3	`███████████████` 100.0%
MCPTox	2025	MCP vectors	1,312	101	`█░░░░░░░░░░░░░░` 7.7%	open	GAP-03
Agent Security Bench (ASB)	2024	Attack categories	10	11	`███████████████` 100%	open	GAP-01
		Total vectors	400	639	`███████████████` 100%
		Utility-under-attack measurement	1	1	`███████████████` 100.0%
TensorTrust	2024	Prompt injection vectors	126,000	206	`░░░░░░░░░░░░░░░` 0.2%	open	GAP-16
		Representative pattern families	—	11	Distinct TensorTrust pattern families covered
WildJailbreak	2024	Jailbreak tactics	105,000	11	`░░░░░░░░░░░░░░░` 0.0%	open	GAP-17
		WildJailbreak-inspired multi-turn vectors	—	10	Distinct tactic families from WildJailbreak
LLMail-Inject / RAG Poisoning	2024	RAG retrieval-targeted vectors	—	13	Retrieval-ranked payloads across multiple document framings	open	GAP-13
Agent-SafetyBench	2024	Business impact types	8	7	`█████████████░░` 87.5%	open	GAP-07
BIPIA	2024	Indirect injection vectors	—	63	Multi-domain benchmark — no fixed target count	open	GAP-02
CyberSecEval	Meta, 2024	Code-generation safety vectors	—	10	Code-gen safety + cyber knowledge elicitation families	open	GAP-18
		Total library overlap	—	639	Multi-category benchmark — partial overlap
ToolEmu	2024	Tool manipulation vectors	144	176	`███████████████` 100%	open	GAP-19
		Dedicated sandbox-evasion vectors	—	10	Sandbox-evasion vectors distinct from generic tool manipulation
R-Judge	2024	R-Judge risk types (10)	10	10	`███████████████` 100.0%	closed	GAP-20
		Risk scoring detectors	—	5	5 detectors — different approach than interaction records
AILuminate	MLCommons, 2025	Resilience gap metric	1	1	`███████████████` 100.0%	closed	GAP-09
		Baseline performance measurement	1	1	`███████████████` 100.0%
		Under-attack performance measurement	1	1	`███████████████` 100.0%
ALERT	2024	ALERT micro categories (32)	32	32	`███████████████` 100.0%	closed	GAP-21
		Harm categories	—	11	N/A
MITRE ATLAS	MITRE, 2025	ATLAS tactics covered	16	16	`███████████████` 100.0%	open	GAP-22
		ATLAS techniques mapped	86	72	`█████████████░░` 83.7%
		Agent-specific techniques covered	14	14	`███████████████` 100.0%

Gap Status Dashboard¶

See Gap Analysis for full details.

ID	Gap	Priority	Issue	Status
GAP-01	Benchmark harness	critical	#32	open
GAP-02	Indirect prompt injection scale	critical	#33	open
GAP-03	MCP tool poisoning	critical	#34	open
GAP-04	Quality-aware jailbreak scoring	critical	#35	closed
GAP-05	Utility-under-attack measurement	important	#36	closed
GAP-06	Harmful multi-step task testing	important	#37	closed
GAP-07	Business impact categorization	important	#38	open
GAP-08	Jailbreak tactic breadth	important	#39	closed
GAP-09	Resilience gap metric	important	#40	closed
GAP-10	OWASP LLM04 (Model DoS)	lower	#41	closed
GAP-11	OWASP LLM05 (Supply Chain)	lower	#42	open
GAP-12	OWASP LLM10 (Model Theft)	lower	#43	open
GAP-13	RAG-specific poisoning	lower	#44	open
GAP-14	Defense evasion measurement	lower	#45	open
GAP-15	JailbreakBench coverage	lower	#54	closed
GAP-16	TensorTrust coverage	lower	#55	open
GAP-17	WildJailbreak coverage	lower	#56	open
GAP-18	CyberSecEval coverage	lower	#57	open
GAP-19	ToolEmu coverage	lower	#58	open
GAP-20	R-Judge coverage	lower	#59	closed
GAP-21	ALERT coverage	lower	#60	closed
GAP-22	MITRE ATLAS technique mapping	important	#61	open
GAP-23	AgentHarm multi-step vector scale	important	#131	open

Vector Inventory¶

By Attack Category¶

Category	Vectors
prompt_injection	206
tool_manipulation	176
indirect_injection	63
data_exfiltration	56
privilege_escalation	35
system_prompt_extraction	27
memory_poisoning	20
authorization_bypass	17
chain_of_thought_manipulation	15
model_dos	13
multi_agent	11

By Tactic¶

Tactic	Vectors
single	410
context_buildup	63
crescendo	38
persona_shift	23
hypothetical	17
role_play	16
distraction	15
refusal_suppression	15
code_mode	14
few_shot	14
language_switch	14

By Severity¶

Severity	Vectors
critical	367
high	201
low	1
medium	70

By Harm Category¶

Harm Category	Vectors
child_exploitation	13
cybercrime	13
disinformation	13
fraud	14
harassment	21
illegal_services	13
self_harm	14
sexual_content	14
substance_abuse	17
terrorism	14
weapons	15

Detection Accuracy¶

Per-detector and pipeline precision/recall/F1 over the labelled detection dataset — see detection-accuracy.md for methodology and validity caveats. Run with benchmarks/detection_accuracy.py.

Detector	F1
refusal	0.90
indicator	1.00
side_effect	1.00
llm_judge	1.00
pipeline	1.00

Baseline over 220 labelled examples. The refusal detector's F1 is bounded by precision (false-alarms on atypically-phrased refusals), not missed compromises.

Pentest Agent vs Rule-Based Scanner¶

Head-to-head on ground-truth targets — what each tool catches vs what it costs. See pentest-evaluation.md. Run with benchmarks/pentest_vs_scanner.py.

US1 harness shipped; agent cassettes are seed-only pending the live --mode record path. On the simulated targets the rule-based scanner reaches full ground-truth recall at ≈0 token cost; novel-discovery is only measurable on the real example agent.

Generated by benchmarks/generate_all.py on 2026-04-22.