Over the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple auditsOver the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple audits

Proof, Not Guesswork: An AI Audit Pipeline That Finds What Other Web3 Audits Miss

2026/03/03 14:07
6 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Over the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple audits, it still uncovered exploitable vulnerabilities. In one run alone, it surfaced eight reproducible issues, including High and Critical findings. That outcome is not luck. It is what a reproducible, multi-stage process produces: a short report and executable proof-of-concept files committed alongside the code. Every finding is backed by a runnable exploit that demonstrates the vulnerability in practice. Proof, not guesswork. In a sector where exploits are real and trust in “potential” findings is low, that bar is not optional.

Early on, I experimented with single-model scans and various AI audit tools. The pattern was consistent: long lists of potential issues, many false positives, and very little that could be demonstrated concretely. Closing the gap between “possible” and “provable” became the goal. The current pipeline grew out of that frustration with maybe-lists and unverified claims.

This is not a replacement for a formal audit. It is an additional, reproducible second opinion at the code level, a code review at scale. You still need a full human audit before mainnet. A reproducible, exploit-backed second pass should be standard practice for code that keeps evolving. This is the pass I run when I want to know what slipped through, or what appeared after the last audit.

What I Built (And Kept Iterating On)

What I built is a pipeline that produces a report and executable proof-of-concept files in the repository. Every finding includes a runnable exploit that demonstrates the issue. If it is in the report, there is a concrete proof-of-concept that shows how it can be triggered.

The filter is not heuristic. Every dropped finding receives a documented rejection reason: factually wrong, no valid attack path, design choice, duplicate, or out of scope. That is a structured quality gate, not gut feel. Only findings with a runnable, non-trivial proof-of-concept survive to the final report. When severity sources disagree, the lower severity is used.

When there is a prior audit report, such as a PDF, I ingest it so the pipeline does not re-flag already reported issues. The run is self-contained and sets up the test environment, such as Foundry, if needed. Output is structured to fit how teams and bug bounty platforms operate.

Pricing is a distinct service with its own scope, positioned between single-model scans and full audits. Full human audits typically range from $50k to $200k or more. This sits in between: repeatable, proof-driven, and scoped to code-level risk.

Scale and Models: What I’ve Been Running

Runs and codebases: Over the past few months I have executed dozens of full pipeline runs across more than 15 codebases. Many had already been audited by two or more teams. This is not a handful of reviews, but repeated application across real projects. The pipeline has been calibrated over many real runs; the current configuration is the result of continuous refinement, not a one-off design.

Getting to this point required substantial experimentation across models and configurations, as well as real spend. The current setup reflects what held up under adversarial challenge and what consistently produced reproducible results.

Explorers per run: Each run executes 8 to 10 explorer agents in parallel. A typical run takes several hours and produces 40 or more candidate findings before deduplication and challenge. The funnel reduces that to a single-digit or low-teens final report. The candidate-to-report ratio is often 3:1 or 4:1.

One run produced 11 findings with executable proof-of-concepts in the final report, including 3 High, 5 Medium, and 3 Low. Another produced 10 findings with proof-of-concepts, 2 overlapping with prior audits, and 8 new findings including High and Critical. Most of that report was new to the client.

Models in the mix: I do not rely on a single model. Multiple model families are used for exploration, challenge, and proof-of-concept construction. The mix is not trial and error. I track which models contribute unique High findings, which generate mostly noise, and which hold up under adversarial challenge. Model selection has been refined over many runs; the current set remains because the data supports it. Remove one and you can miss a finding. Retries and fallbacks ensure that each run completes even if a step encounters issues.

Why a Second Pass Finds Things

Human audits are finite. Edge cases slip through. Code changes. Refactors introduce new paths. A single audit is a snapshot; code is a moving system. Relying on one pass is comfortable, but risky when the attack surface is large.

A second pass does not imply the first audit was poor. It acknowledges that coverage is difficult.

The pipeline is not just another opinion. It works differently: multiple independent exploration paths, an adversarial challenger that questions every finding, conservative deduplication rules, and explicit validation where only issues backed by a working proof-of-concept survive. That is a methodologically different perspective.

Explorers perform structured analysis across protocol design, logic, economics, and attack paths. Findings are merged and deduplicated. A challenger attempts to invalidate them. Proof-of-concepts are built and executed in your framework, such as Foundry or Hardhat. Only findings backed by a working exploit remain in the final report. Everything rejected is logged with a documented reason.

The real pipeline includes significantly more stages than the simplified diagram below. Behind each stage are validation and control mechanisms. It is staged quality assurance, not a linear AI workflow. A full run spans multiple structured analysis and validation phases, emphasizing depth rather than runtime.

The Numbers

Dozens of candidates per run are systematically reduced to a small, defensible set. Only findings backed by runnable proof-of-concepts survive. That is what it takes to get to proof instead of guesswork.

If it is in the report, there is a working proof-of-concept demonstrating it. That is the bar.

When It Fits

Pre-launch as an additional pass. After a refactor or upgrade. Following governance or tokenomics changes. Before a raise or external audit. Narrow scopes such as a single module or integration.

The same fixed categories are applied every run. I do not publish the list. What ultimately appears in the report depends on the specific attack surface of your codebase.

What Kind of Hardening Shows Up

Across dozens of runs, certain patterns consistently surface. Governance parameters that can be zero or invalid on production paths. Withdrawal and queue logic that cannot progress under loss scenarios. First-depositor and reward front-running in share-based systems. Withdrawal flows that stall when one market becomes illiquid.

These are recurring classes of issues that appear under structured analysis and adversarial challenge. If your system has similar surface area, that is the kind of coverage this pipeline is designed to stress.

If a reproducible, exploit-backed second pass makes sense for your codebase, you know where to find me. Telegram: @Kurt0x


Proof, Not Guesswork: An AI Audit Pipeline That Finds What Other Web3 Audits Miss was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Market Opportunity
Notcoin Logo
Notcoin Price(NOT)
$0.0003572
$0.0003572$0.0003572
-3.35%
USD
Notcoin (NOT) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

OpenClaw API Integration Is Live in the Crypto.com App: Here’s What Traders Need to Know

OpenClaw API Integration Is Live in the Crypto.com App: Here’s What Traders Need to Know

TLDR: OpenClaw API integration is now live in the Crypto.com App via the new Agent Key feature for traders. Users can set weekly trading budgets to cap how much
Share
Blockonomi2026/03/03 19:30
Best crypto Coin Presales in October 2025

Best crypto Coin Presales in October 2025

The post Best crypto Coin Presales in October 2025 appeared on BitcoinEthereumNews.com. Crypto News Explore the best crypto coin presales in October 2025, featuring Sui and top projects like BullZilla, MoonBull, and La Culex with high ROI potential. Sui is rapidly gaining recognition as one of the most promising players in the blockchain space. As the crypto market heats up, the best crypto coin presales in October 2025 are attracting attention from investors eager to capitalize on the next wave of explosive growth. This article explores these five standout projects, highlighting their growth potential and why they deserve a spot on every crypto enthusiast’s radar. In the rapidly evolving crypto market, identifying the best crypto coin presales in October 2025 can feel like striking gold. This month, five projects are catching eyes, each promising significant growth backed by innovative mechanics and strong community support. BullZilla: The Full Send Presale Powerhouse BullZilla is not just another meme coin presale; it is among the best crypto coin presales in October 2025, and it’s a meticulously engineered project primed for explosive growth. Currently in Stage 7, Phase 2, BullZilla continues to demonstrate unstoppable momentum. With over $920,000 raised, more than 31 billion tokens sold, and a community exceeding 3,000 holders, the project’s traction speaks volumes about investor confidence and the strength of its ecosystem. What truly sets BullZilla apart is its exceptional return on investment (ROI). Early participants from the beginning of Stage 7B have already realized a possible 2898.26% ROI, while the projected ROI from Stage 7B to the anticipated listing price of $0.00527 stands at an impressive 2957.66% potential. To put it into perspective, a $1,000 investment at this stage would yield approximately 5.8 million $BZIL tokens, positioning holders for potentially substantial gains once the project lists. With its blend of strong capital inflow, data-driven tokenomics, and an engaged community, BullZilla remains one…
Share
BitcoinEthereumNews2025/10/20 12:18
Sam Altman Concedes OpenAI’s Pentagon Partnership Was Rushed and Poorly Executed

Sam Altman Concedes OpenAI’s Pentagon Partnership Was Rushed and Poorly Executed

OpenAI CEO Sam Altman admits the Pentagon deal was rushed. The company is now revising terms to prevent domestic surveillance and NSA use of its AI. The post Sam
Share
Blockonomi2026/03/03 19:16