OpenAI has collaborated with Paradigm to introduce EVMbench, a new benchmarking framework designed to assess how artificial intelligence agents interact with smartOpenAI has collaborated with Paradigm to introduce EVMbench, a new benchmarking framework designed to assess how artificial intelligence agents interact with smart

OpenAI and Paradigm Launch AI Benchmark for Smart Contract Security

2026/02/19 22:58
4 min read

OpenAI has collaborated with Paradigm to introduce EVMbench, a new benchmarking framework designed to assess how artificial intelligence agents interact with smart contract security. The initiative focuses on measuring the ability of AI systems to analyze, modify, and exploit smart contracts within controlled environments, reflecting the growing importance of automated security tools in decentralized finance.

Smart contracts currently underpin more than $100 billion in open-source digital assets, making their reliability a critical component of the global crypto financial infrastructure. As these contracts increasingly manage high-value transactions, the role of AI in reading, writing, and auditing code has become more significant. EVMbench is intended to evaluate AI performance in economically relevant scenarios while encouraging the defensive application of AI to strengthen deployed contracts against potential vulnerabilities.

A Dataset Grounded in Real-World Vulnerabilities

The EVMbench framework is built using a dataset that includes 120 carefully selected high-severity vulnerabilities. These weaknesses were drawn from 40 separate security audits and open code competitions, ensuring that the benchmark reflects real-world threat patterns rather than theoretical flaws. In addition, the dataset incorporates specific vulnerability scenarios identified during a security audit of the Tempo blockchain, further grounding the framework in practical security challenges.

To maintain safety and reproducibility, the system relies on a Rust-based testing harness. This setup restricts unsafe remote procedure call methods and executes all exploit-related tasks within a local Anvil environment rather than on live blockchain networks. By isolating tests from production systems, the framework allows for rigorous experimentation without risking actual assets or disrupting network operations.

Capability Modes Designed to Mirror Security Workflows

EVMbench evaluates AI agents across three distinct capability modes, each designed to simulate real-world smart contract security tasks. The Detect mode assesses whether an agent can audit a smart contract repository and identify known vulnerabilities based on historical data. Performance in this mode is measured by how accurately the agent recalls ground-truth vulnerabilities and the audit rewards it achieves.

The Patch mode shifts focus to remediation, requiring agents to modify vulnerable contracts to remove exploits while preserving intended functionality. Success is verified through automated testing that confirms the exploit has been eliminated and the code compiles correctly. This mode reflects the practical challenges faced by security engineers who must fix flaws without introducing new issues.

The Exploit mode evaluates offensive capabilities by testing whether an agent can execute a full fund-draining attack against a deployed contract in a sandboxed blockchain environment. Results are graded programmatically through transaction replay, offering a clear metric of exploit effectiveness that defensive systems must be able to counter.

Model Performance and Ongoing Safety Efforts

Initial results from EVMbench indicate substantial progress in AI performance on certain cybersecurity tasks. In exploit testing, OpenAI’s GPT-5.3-Codex model achieved a success rate exceeding 70 percent, representing a notable improvement compared with earlier model versions evaluated roughly six months prior. However, the findings also indicate that detection and patching remain more challenging areas.

AI agents were frequently observed struggling to fully preserve contract functionality while resolving subtle vulnerabilities, underscoring the continued importance of human oversight in smart contract auditing. These limitations highlight that, while AI can augment security workflows, it has not yet replaced expert review.

Given the dual-use nature of cybersecurity tools, OpenAI has emphasized a defense-oriented approach. The company has expanded its security research agent, Aardvark, and committed $10 million in API credits through its Cybersecurity Grant Program. These efforts are intended to accelerate defensive research for open-source software and critical infrastructure.

Toward Standardized AI Security Evaluation

Although EVMbench does not yet support advanced features such as complex timing mechanics or mainnet forks, it represents a meaningful step toward standardizing how AI systems are evaluated in blockchain security contexts. By providing a controlled, reproducible framework, the benchmark offers researchers and developers a clearer view of both the strengths and limitations of AI in securing smart contracts, contributing to a more resilient decentralized ecosystem.

The post OpenAI and Paradigm Launch AI Benchmark for Smart Contract Security appeared first on CoinTrust.

Market Opportunity
Smart Blockchain Logo
Smart Blockchain Price(SMART)
$0.004456
$0.004456$0.004456
+1.06%
USD
Smart Blockchain (SMART) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

CME Group to Launch Solana and XRP Futures Options

CME Group to Launch Solana and XRP Futures Options

The post CME Group to Launch Solana and XRP Futures Options appeared on BitcoinEthereumNews.com. An announcement was made by CME Group, the largest derivatives exchanger worldwide, revealed that it would introduce options for Solana and XRP futures. It is the latest addition to CME crypto derivatives as institutions and retail investors increase their demand for Solana and XRP. CME Expands Crypto Offerings With Solana and XRP Options Launch According to a press release, the launch is scheduled for October 13, 2025, pending regulatory approval. The new products will allow traders to access options on Solana, Micro Solana, XRP, and Micro XRP futures. Expiries will be offered on business days on a monthly, and quarterly basis to provide more flexibility to market players. CME Group said the contracts are designed to meet demand from institutions, hedge funds, and active retail traders. According to Giovanni Vicioso, the launch reflects high liquidity in Solana and XRP futures. Vicioso is the Global Head of Cryptocurrency Products for the CME Group. He noted that the new contracts will provide additional tools for risk management and exposure strategies. Recently, CME XRP futures registered record open interest amid ETF approval optimism, reinforcing confidence in contract demand. Cumberland, one of the leading liquidity providers, welcomed the development and said it highlights the shift beyond Bitcoin and Ethereum. FalconX, another trading firm, added that rising digital asset treasuries are increasing the need for hedging tools on alternative tokens like Solana and XRP. High Record Trading Volumes Demand Solana and XRP Futures Solana futures and XRP continue to gain popularity since their launch earlier this year. According to CME official records, many have bought and sold more than 540,000 Solana futures contracts since March. A value that amounts to over $22 billion dollars. Solana contracts hit a record 9,000 contracts in August, worth $437 million. Open interest also set a record at 12,500 contracts.…
Share
BitcoinEthereumNews2025/09/18 01:39
Logitech G Drops a Wide Array Of New Products And Innovations At Logitech G PLAY 2025

Logitech G Drops a Wide Array Of New Products And Innovations At Logitech G PLAY 2025

Logitech G PLAY 2025 is a live-streamed global gaming event that brings together press, partners, creators, and fans to explore the future of gaming. The array of products and experiences included major innovations across PC and console gaming, esports, sim racing, and streaming tools, along with partnerships with McLaren Racing, NVIDIA and more.
Share
Hackernoon2025/09/18 05:42
Zuckerberg denies Instagram was built to hook children

Zuckerberg denies Instagram was built to hook children

Mark Zuckerberg testified in a Los Angeles federal courtroom this week, defending Instagram against claims that the platform was built to hook children and teenagers
Share
Cryptopolitan2026/02/20 01:15