OpenAI and Paradigm launch EVMbench: AI can now audit Ethereum contracts

OpenAI and Paradigm launch EVMbench: AI can now audit Ethereum contracts

Discover how the new GPT-5.3-Codex model detects flaws in the Ethereum network. 

The ability of this new model to accurately analyze code and understand the behavior of blockchain smart contracts is marking a key advance in protecting decentralized financial ecosystems.

With its development, technological advancement demonstrates once again that the relationship between artificial intelligence and blockchain technology is no longer a futuristic experiment, but has become a tangible tool that improves the security of digital assets. 

In the cryptocurrency market, where decentralized finance protocols manage billions of dollars, the stability of the code becomes central to trust for millions of people. 

Vulnerabilities discovered in well-known projects serve as a reminder that, despite community efforts, human error and sophisticated attacks remain real threats. Therefore, AI-powered tools are providing a new layer of defense. 

GPT-5.3-Codex It not only analyzes code for inconsistencies, but also interprets patterns and anticipates potential failures before they affect funds on the main network. Its role is no longer limited to programming assistance; it acts as an intelligent auditor capable of learning from each review and adapting to the ever-changing pace of the blockchain industry. The result is a more secure and transparent environment, where developers, auditors, and investors have access to technological support that can reduce risks without stifling innovation.

Trade Ethereum: access Bit2Me

EVMBench: the new frontier of artificial intelligence in blockchain security

OpenAI, in collaboration with the investment firm Paradigm, recently unveiled EVMbench, a new assessment framework created for measuring the performance of artificial intelligence agents within the Ethereum Virtual MachineThis technology seeks to analyze how advanced language models address real technical challenges in the blockchain ecosystem and how well prepared they are to detect, correct, and exploit vulnerabilities in controlled environments.

According to the developers, EVMbench is based on a set of 120 high-severity flaws extracted from forty authentic audits performed on smart contracts. Through this rigorous selection, the system tests the models' ability to Identify logical errors, repair code without affecting its behavior, and execute simulated attacks without external consequences.Furthermore, the tests include scenarios on the Tempo network, a Layer 1 chain focused on stablecoin payments. This inclusion allows for observing the performance of AI agents in contexts that reflect the dynamics of real-world financial and commercial use within the crypto sector.

The framework operates under three complementary modes that replicate the core phases of cybersecurity. In the detection stageArtificial intelligence examines repositories to locate vulnerabilities documented by human auditors. Then, the patching mode It assesses their ability to eliminate the weakness without breaking the code or altering the operational logic. Finally, the mode of exploitation It places the agent in a simulation environment where it must successfully perform a drain on funds, always within the limits of the experiment.

The most recent results, updated as of the date of this publication, reveal a remarkable leap in the offensive capabilities of OpenAI's models. The company noted that version GPT-5.3 Codex achieved a 72,2% success rate in attack testsThis is more than double the 31,9% achieved by its predecessor just six months earlier. This advance suggests substantial progress in the technical understanding and adaptability of new generations of models, marking a key point in the relationship between artificial intelligence and cybersecurity in the blockchain field.

Create your account and explore the potential of ETH

The promises and limitations revealed by EVMbench

Following the presentation of EVMbench, the results highlight a paradox in the relationship between artificial intelligence and cybersecurity. Although the ability to execute advanced attacks has improved significantly, the data shows that detecting and remediating vulnerabilities still presents a considerable challenge.

In a Valid identity document Technically, the developers said that artificial intelligence models typically perform better when pursuing a specific value extraction goal, but They encounter difficulties when performing tasks that require close inspection and precise technical correction.They often stop their analysis after identifying the first failure, without completing a full system review, which limits their effectiveness in complex audits.

According to experts, this situation highlights the ongoing imbalance between the ability to recognize attack patterns and the capacity to proactively strengthen code. They commented that closing this gap requires expert oversight and constant technical calibration to minimize false positives and avoid implementing solutions that, instead of strengthening security, create new entry points.

OpenAI has recognized the dual nature of these tools, as they can serve both those protecting systems and those seeking to exploit them. Therefore, it has established strict controls over access to and use of its most advanced features, combining automated monitoring with enhanced security policies. The intention is to usher in a new era in software auditingwhere artificial intelligence acts as a continuous support for open project developers. 

In that direction, the organization has allocated ten million dollars in research grants aimed at strengthening cybersecurity, with special attention to protecting critical infrastructure and the software that underpins the global crypto economy.

Access your wallet: Buy Ethereum today

Towards open audits in the on-chain era

The launch of this new intelligent assessment framework is not intended to replace human judgment or the traditional audits that have defined the sector for years. On the contrary, the company emphasized that the initiative seeks to strengthen expert judgment and traditional audits through technical tools that allow more consistent checks. The proposal generally functions as a bridge between research and practice, encouraging the scientific community to adopt more rigorous and transparent validation models.

Therefore, with the launch of EVMbench and the release of its data, the possibility arises of establishing shared standards for evaluating the behavior of autonomous agents in blockchain environments. This development is especially relevant now that There are more than 35.000 artificial intelligence agents operating within the Ethereum network under the ERC-8004 standard, which expands both the potential and the risks of the on-chain economy.

Therefore, ensuring a clear measurement of these systems' capabilities is key for organizations to anticipate vulnerabilities and design proportionate defenses. As financial code evolves toward more automated and resilient architectures, translating technical metrics into concrete improvements becomes essential to protecting users and building trust in the digital ecosystem.

Join Bit2Me and trade Ethereum now