Article: Flaw-Finding AI Assistants Face Criticism for Speed, Accuracy

Meridiano

Add Article

Image for Article: Flaw-Finding AI Assistants Face Criticism for Speed, Accuracy

Article Details

Title

Article: Flaw-Finding AI Assistants Face Criticism for Speed, Accuracy

Impact Score

3 / 10

AI Summary (Processed Content)

AI-powered vulnerability detection tools like Claude Code Security and OpenAI's Aardvark are facing criticism for being slow, generating false positives, and not integrating well into existing enterprise development pipelines. While they demonstrate potential by finding vulnerabilities, experts caution they currently function more as previews than practical products, often being outperformed by faster, cheaper specialized tools. The main topics covered are the performance shortcomings of these new AI security assistants, their impact on the cybersecurity market, and the enduring value of established vendors with deep expertise and integrated solutions.

Original URL

https://www.darkreading.com/application-security/flaw-finding-ai-assistants-face-criticism-speed-accuracy

Source Feed

darkreading

Published Date

2026-02-27 20:16

Fetched Date

2026-03-04 13:40

Processed Date

2026-03-04 13:52

Embedding Status

Present

Cluster ID

Not Clustered

Raw Extracted Content

News, news analysis, and commentary on the latest trends in cybersecurity technology.
Flaw-Finding AI Assistants Face Criticism for Speed, Accuracy
Using AI to find security vulnerabilities holds significant promise, but the initial products fall short of the needs of enterprises and software developers, say experts.
Anthropic's announcement of limited research preview of Claude Code Security — a tool that reads code, finds vulnerabilities, and proposes fixes — has caused no small amount of turmoil in the cybersecurity industry. Anthropic revealed that its latest reasoning engine, Claude Opus 4.6, found more than 500 zero-day vulnerabilities in open-source projects.
While many investors panicked on the news, application-security experts cautioned that the initial iteration of Claude Code Security and OpenAI's Aardvark tool, released in October, are slow, prone to false positives, and do not readily fit into the development pipelines of most enterprises. Even as the system evolves, automated reasoning about code security may be limited to helping AI companies produce more secure code or helping developers better understand their code, not necessarily replacing existing security checks in the current pipeline, says Julian Totzek-Hallhuber, senior principal solution architect at Veracode.
"Does it really fit my process when vibe coating is so fast in helping me building these applications, but then I'm slower using these tools and finding the flaws in code compared to another tool?" he says. Instead, existing vendors, such as Veracode, are already using their expertise as well as "using AI tools in the background to help developers better understand and generate fixes for them."
As AI vendors move into various market segments, the market reacts with uncertainty. Yet, the reaction ignores the complexity of how many software vendors address their customers' needs, a team of analysts at Forrester Research stated on Feb. 11 in a research note.
Companies with specific expertise can "preserve their moat using what’s hard for AI-only companies to replicate: their deep vertical experience building specialized solutions; deep bench of consulting partners; access to vast customer data for benchmarking and machine learning; and the integration of people, process, tech, and governance," the analysts wrote.
Slow and Prone to False Positives
As with most AI innovations, it's too early to tell whether those predictions will hold for the application-security market, but so far, most experts consider the initial tools more of a preview than a product.
Among the most serious concerns: Developers and application-security experts are complaining that the scans are currently far too slow. In one test posted to LinkedIn, an analyst found that the security review function in Claude Code, which is likely the basis of Claude Code Security, took 17 minutes to review a code sample, finding three vulnerabilities, but of which two issues were false positives. In comparison, OpenGrep took 30 seconds to find the same issue, the post stated.
It's a finding confirmed by Neatsun Ziv, co-founder and CEO of OX Security, a security platform for vibe-coding developers. In his own tests, Claude Code Security took more than 15 minutes — and cost $4 in token costs — to find a flaw that could be found with a static application security testing (SAST) tool for less than a cent, he says.
In addition, today's development processes use a variety of tools to provide defense-in-depth throughout the software development lifecycle. A pipeline that relies on the same foundational AI for both writing and reviewing code is not ideal, Ziv argues. With human developers, the best security practice is to prevent the same programmer from writing and reviewing code for a new feature, patch, or modification.
"They [Anthropic] are actually using Claude Code on themselves, and it's kind of an issue when you're saying, 'Hey, I'm a developer ... I'm writing the code [and] I'm going to test my own code,'" Ziv says.
Complementary, Not a Substitution
In many ways, Anthropic's and OpenAI's tools seem likely less about improving the security design and codebases of human-created applications and more about making up for the shortcomings apparent in AI-assisted and agentic-AI development, such as the OpenClaw development saga, says Veracode's Totzek-Hallhuber.
The share of organizations affected by both overall and critical security debt rose in 2026. Source: Veracode
In its "2026 State of Software Security Report," Veracode found that companies are accruing more security debt and producing more high-severity vulnerabilities due to the faster generation of code: 82% of companies had debt, compared to 74% last year, and 11.3% of vulnerabilities ranked as severe, compared to 8.3% the previous year. The trends are likely caused by the rapid adoption of agentic-AI development platforms, he says.
"Everybody's complaining about, whenever I write code, I am fast, but I'm insecure like crazy," says Totzek-Hallhuber. "So it's interesting to see all these AI vendors [supporting] their vibe coding with security-testing solutions."
In addition, the Anthropic announcement does not address the hard part of vulnerability management: Not just finding vulnerabilities, but remediating them in a way that fits with a company's development pipeline, Randall Degges, vice president of AI engineer and developer relations at application-security firm Snyk, wrote in a response to Claude Code Security.
"The hard part, the part that keeps AppSec teams up at night, the part that generates the multi-year backlogs and the 'we'll get to it next sprint' conversations, is fixing them," he said. "At scale. Across hundreds of repositories. Without breaking anything. While developers are shipping new features at breakneck speed. In code they didn't write, using libraries they didn't choose, in languages they may not be experts in."
Yet replacing manual code review with AI-augmented review and using AI to enrich and explain vulnerability findings could both be of real value, says Vercode's Totzek-Hallhuber. And they may expand the application security market rather than shrink it, he says.
"Thirty years back, we did manual code reviews, [but] that art died somehow — it doesn't really exist anymore today," he says. "Now you can do this with an AI tool, ... allowing you to interact with the results and the tools, and maybe that's building a new industry for us again."