ExploitArena

A decentralized bug bounty protocol where AI attacker agents find vulnerabilities, independent verifier agents confirm them in sandboxed environments, CVSS v4.0 scores determine severity, and smart contracts distribute bounties — trustlessly, on-chain, with zero human triage.

The Problem

Bug bounties are broken. $100M+ sits in bounty pools across Immunefi and HackerOne, yet the system fails in three core ways:

Manual triage is the bottleneck. Every submission needs a human security expert — expensive, slow, and unscalable as smart contract deployments multiply.

Smart contracts can't wait. They're immutable and hold real assets. The Ronin Bridge hack: $625M. The DAO hack: $60M. Vulnerabilities that thorough audits would have caught — but audits take weeks and cost tens of thousands.

No standard severity scoring on-chain. Traditional security has CVSS (used by NIST, CERT, every major security team). Web3 bounty platforms have vibes.

How It Works

Developer submits contract + bounty + deadline
                    │
                    ▼
    ┌───────────────────────────────┐
    │     ATTACKER AGENT POOL       │
    │  Each agent gets an isolated  │
    │  cloud sandbox with shell,    │
    │  compiles, writes PoCs, and   │
    │  TESTS exploits before submit │
    └───────────────┬───────────────┘
                    │ tested exploit submitted
                    ▼
    ┌───────────────────────────────┐
    │     VERIFIER AGENT POOL       │
    │  Each verifier independently: │
    │  1. Gets own isolated sandbox │
    │  2. Reproduces the exploit    │
    │  3. Measures actual impact    │
    │  4. Computes CVSS v4.0 score  │
    │  5. Casts on-chain vote       │
    └───────────────┬───────────────┘
                    │ consensus reached
                    ▼
    BountyEscrow auto-resolves → payout scaled to CVSS severity

Step 1 — Escrow & Deploy: A developer submits their smart contract (by GitHub URL or source code), deposits a bounty amount in ETH, and sets a deadline. The bounty is locked in an on-chain escrow contract.

Step 2 — AI Attackers Compete in Sandboxes: A pool of AI agents — each running inside an isolated E2B cloud sandbox with full shell, Node.js, Python, and git — independently analyze the codebase. Each agent explores the repo, identifies vulnerabilities, writes exploit code, and must test it inside the sandbox and see it succeed before submitting. No theoretical submissions are accepted.

Step 3 — Verify & Auto-Payout: Independent verifier agents each reproduce the exploit in their own sandbox and compute a CVSS v4.0 severity score (same standard used by NIST's NVD). On-chain BountyEscrow auto-resolves: 3-of-5 verifier supermajority triggers a payout scaled to CVSS severity. Deadline with no confirmed exploit? Full refund to the developer. No admin, no human triage.

What Makes This Different

Unlike existing bug bounty platforms that rely on manual review and subjective severity assessment, ExploitArena implements a fully automated, trustless pipeline:

Tested exploits only — AI agents must prove their exploits work in isolated sandboxes before submission
CVSS v4.0 on-chain — Industry-standard severity scoring mapped to deterministic on-chain arithmetic (no floats, fixed-point only)
Multi-agent isolation — 5+ independent verifiers each clone the target repo fresh, inject the exploit, and run independently in parallel
Trustless authorization — BountyEscrow only accepts votes from authorized verifier agents while keeping authorization permissionless for new verifiers

Challenges We Ran Into

Getting AI agents to produce verified working exploits — The hardest constraint was requiring attacker agents to actually run and confirm their exploit inside the sandbox (not just generate plausible-looking code). This meant engineering a tight tool loop: shell execution, contract compilation, and a local Hardhat node inside E2B, with a SKILLS.md prompt that enforced the distinction between "I think this works" and "I ran it and saw the balance drained."

CVSS v4.0 on-chain — Mapping the multi-dimensional CVSS v4.0 formula to deterministic on-chain arithmetic (no floats, fixed-point only) required careful design and unit testing against the official CVSS calculator's expected outputs.

Multi-agent isolation — Running 5+ independent verifiers in truly isolated E2B sandboxes — each cloning the target repo fresh, injecting the exploit, and running independently in parallel — required building a sandbox management layer with proper lifecycle and cleanup guarantees.

Trustless verifier authorization — The BountyEscrow contract only accepts votes from authorized verifier agents (to prevent Sybil attacks), while keeping authorization permissionless enough for new verifiers. The vote-tallying logic is resistant to double-voting and replay.

Project Structure

exploit-arena/
├── apps/
│   └── web/                     # Next.js frontend + API routes
├── packages/
│   ├── agents/                  # AI agent system
│   │   ├── attacker/            # LLM agent with sandbox tools
│   │   ├── verifier/            # Independent verification agent
│   │   ├── sandbox/             # E2B sandbox management
│   │   ├── orchestrator/        # Pipeline: attack → verify → on-chain
│   │   ├── chain.ts             # Viem helpers for on-chain ops
│   │   └── SKILLS.md            # Agent workflow spec
│   ├── cli/                     # arena demo / scan CLI
│   ├── contracts/               # Solidity: BountyEscrow + demos
│   ├── mcp/                     # MCP server for external AI agents
│   └── shared/                  # Types, ABI, CVSS v4.0 scoring
├── docker-compose.yml           # Local dev services
└── turbo.json                   # Turbo build pipeline

Getting Started

Prerequisites

Node.js ≥ 18
pnpm ≥ 9
An E2B API key (for cloud sandboxes)
An OpenAI API key (or any OpenAI-compatible endpoint)

Quick Start

git clone https://github.com/WhyAsh5114/exploit-arena && cd exploit-arena
pnpm install

cp .env.example .env
# Required: E2B_API_KEY, OPENAI_API_KEY
# Optional: OPENAI_BASE_URL, OPENAI_MODEL

pnpm build

# Terminal 1: Start local Hardhat node
cd packages/contracts && pnpm node

# Terminal 2: Run the demo
pnpm --filter @exploit-arena/cli exec arena demo

Run the Web Dashboard

pnpm --filter @exploit-arena/web dev
# Opens at http://localhost:3000

Demo Output

$ arena demo

⚡ ExploitArena — Full On-Chain Demo

✔ BountyEscrow deployed at 0x5FbDB...
✔ 3 verifiers authorized on-chain
✔ Bounty #0 created — 10 ETH escrowed

Running AI pipeline: attack → verify → auto-resolve
──────────────────────────────────────────────────
✔ Exploit found: Reentrancy in withdraw() (Critical)

─── Verifications ───
  CONFIRMED — CVSS 9.3 (Critical)
  CONFIRMED — CVSS 9.3 (Critical)

─── On-chain Result ───
  Status: Resolved
  Exploit count: 1
  Avg CVSS: 9.3

✓ BOUNTY RESOLVED — exploit confirmed on-chain
  Attacker's withdrawable balance: 10.0 ETH

License

MIT — see LICENSE

Built at KJSSE GajShield Hack X · April 2026 · Mumbai 🏆