ExploitArena
A decentralized bug bounty protocol where AI attacker agents find vulnerabilities, independent verifier agents confirm them in sandboxed environments, CVSS v4.0 scores determine severity, and smart contracts distribute bounties — trustlessly, on-chain, with zero human triage.
The Problem
Bug bounties are broken. $100M+ sits in bounty pools across Immunefi and HackerOne, yet the system fails in three core ways:
Manual triage is the bottleneck. Every submission needs a human security expert — expensive, slow, and unscalable as smart contract deployments multiply.
Smart contracts can't wait. They're immutable and hold real assets. The Ronin Bridge hack: $625M. The DAO hack: $60M. Vulnerabilities that thorough audits would have caught — but audits take weeks and cost tens of thousands.
No standard severity scoring on-chain. Traditional security has CVSS (used by NIST, CERT, every major security team). Web3 bounty platforms have vibes.
How It Works
Developer submits contract + bounty + deadline
│
▼
┌───────────────────────────────┐
│ ATTACKER AGENT POOL │
│ Each agent gets an isolated │
│ cloud sandbox with shell, │
│ compiles, writes PoCs, and │
│ TESTS exploits before submit │
└───────────────┬───────────────┘
│ tested exploit submitted
▼
┌───────────────────────────────┐
│ VERIFIER AGENT POOL │
│ Each verifier independently: │
│ 1. Gets own isolated sandbox │
│ 2. Reproduces the exploit │
│ 3. Measures actual impact │
│ 4. Computes CVSS v4.0 score │
│ 5. Casts on-chain vote │
└───────────────┬───────────────┘
│ consensus reached
▼
BountyEscrow auto-resolves → payout scaled to CVSS severity
Step 1 — Escrow & Deploy: A developer submits their smart contract (by GitHub URL or source code), deposits a bounty amount in ETH, and sets a deadline. The bounty is locked in an on-chain escrow contract.
Step 2 — AI Attackers Compete in Sandboxes: A pool of AI agents — each running inside an isolated E2B cloud sandbox with full shell, Node.js, Python, and git — independently analyze the codebase. Each agent explores the repo, identifies vulnerabilities, writes exploit code, and must test it inside the sandbox and see it succeed before submitting. No theoretical submissions are accepted.
Step 3 — Verify & Auto-Payout: Independent verifier agents each reproduce the exploit in their own sandbox and compute a CVSS v4.0 severity score (same standard used by NIST's NVD). On-chain BountyEscrow auto-resolves: 3-of-5 verifier supermajority triggers a payout scaled to CVSS severity. Deadline with no confirmed exploit? Full refund to the developer. No admin, no human triage.
What Makes This Different
Unlike existing bug bounty platforms that rely on manual review and subjective severity assessment, ExploitArena implements a fully automated, trustless pipeline:
- Tested exploits only — AI agents must prove their exploits work in isolated sandboxes before submission
- CVSS v4.0 on-chain — Industry-standard severity scoring mapped to deterministic on-chain arithmetic (no floats, fixed-point only)
- Multi-agent isolation — 5+ independent verifiers each clone the target repo fresh, inject the exploit, and run independently in parallel
- Trustless authorization — BountyEscrow only accepts votes from authorized verifier agents while keeping authorization permissionless for new verifiers
Challenges We Ran Into
Getting AI agents to produce verified working exploits — The hardest constraint was requiring attacker agents to actually run and confirm their exploit inside the sandbox (not just generate plausible-looking code). This meant engineering a tight tool loop: shell execution, contract compilation, and a local Hardhat node inside E2B, with a SKILLS.md prompt that enforced the distinction between "I think this works" and "I ran it and saw the balance drained."
CVSS v4.0 on-chain — Mapping the multi-dimensional CVSS v4.0 formula to deterministic on-chain arithmetic (no floats, fixed-point only) required careful design and unit testing against the official CVSS calculator's expected outputs.
Multi-agent isolation — Running 5+ independent verifiers in truly isolated E2B sandboxes — each cloning the target repo fresh, injecting the exploit, and running independently in parallel — required building a sandbox management layer with proper lifecycle and cleanup guarantees.
Trustless verifier authorization — The BountyEscrow contract only accepts votes from authorized verifier agents (to prevent Sybil attacks), while keeping authorization permissionless enough for new verifiers. The vote-tallying logic is resistant to double-voting and replay.
Project Structure
exploit-arena/
├── apps/
│ └── web/ # Next.js frontend + API routes
├── packages/
│ ├── agents/ # AI agent system
│ │ ├── attacker/ # LLM agent with sandbox tools
│ │ ├── verifier/ # Independent verification agent
│ │ ├── sandbox/ # E2B sandbox management
│ │ ├── orchestrator/ # Pipeline: attack → verify → on-chain
│ │ ├── chain.ts # Viem helpers for on-chain ops
│ │ └── SKILLS.md # Agent workflow spec
│ ├── cli/ # arena demo / scan CLI
│ ├── contracts/ # Solidity: BountyEscrow + demos
│ ├── mcp/ # MCP server for external AI agents
│ └── shared/ # Types, ABI, CVSS v4.0 scoring
├── docker-compose.yml # Local dev services
└── turbo.json # Turbo build pipeline
Getting Started
Prerequisites
- Node.js ≥ 18
- pnpm ≥ 9
- An E2B API key (for cloud sandboxes)
- An OpenAI API key (or any OpenAI-compatible endpoint)
Quick Start
git clone https://github.com/WhyAsh5114/exploit-arena && cd exploit-arena
pnpm install
cp .env.example .env
# Required: E2B_API_KEY, OPENAI_API_KEY
# Optional: OPENAI_BASE_URL, OPENAI_MODEL
pnpm build
# Terminal 1: Start local Hardhat node
cd packages/contracts && pnpm node
# Terminal 2: Run the demo
pnpm --filter @exploit-arena/cli exec arena demo
Run the Web Dashboard
pnpm --filter @exploit-arena/web dev
# Opens at http://localhost:3000
Demo Output
$ arena demo
⚡ ExploitArena — Full On-Chain Demo
✔ BountyEscrow deployed at 0x5FbDB...
✔ 3 verifiers authorized on-chain
✔ Bounty #0 created — 10 ETH escrowed
Running AI pipeline: attack → verify → auto-resolve
──────────────────────────────────────────────────
✔ Exploit found: Reentrancy in withdraw() (Critical)
─── Verifications ───
CONFIRMED — CVSS 9.3 (Critical)
CONFIRMED — CVSS 9.3 (Critical)
─── On-chain Result ───
Status: Resolved
Exploit count: 1
Avg CVSS: 9.3
✓ BOUNTY RESOLVED — exploit confirmed on-chain
Attacker's withdrawable balance: 10.0 ETH
License
MIT — see LICENSE
Built at KJSSE GajShield Hack X · April 2026 · Mumbai 🏆




