Anthropic has launched an AI tool called Code Review to automate the process of identifying bugs in software code before it is merged. The tool uses multiple AI agents to analyze pull requests in parallel, focusing on logic errors and providing severity-ranked, actionable feedback rather than style critiques.
Internal testing shows the tool finds issues in 84% of large pull requests, with engineers disagreeing with less than 1% of its findings. However, this thorough analysis comes at a higher cost, averaging $15-$25 per review, compared to lighter-weight alternatives.
The tool is currently available in beta for Team and Enterprise plans, aiming to address the bottleneck of manual code reviews exacerbated by the rise of AI-assisted "vibe coding."
Main Topics: Anthropic's Code Review AI tool, its functionality and focus, testing results and accuracy, cost and availability.
Anthropic rolled out a new AI tool called Code Review on Monday to identify bugs before they enter the software codebase.
Peer feedback has long been essential in coding, helping developers catch errors, maintain consistency across a codebase, and improve overall software quality. At the same time, the rise of âvibe codingâ (AI tools that generate code from plain-language instructions) has accelerated development but also brought new bugs, security risks, and code that is hard to understand.
âCode review has become a bottleneck, and we hear the same from customers every week,â Anthropic said in a blog post. âThey tell us developers are stretched thin, and many PRs [pull request] get skims rather than deep reads.â
Pull requests are used by developers to submit code changes for review before the updates are merged into the main software.
Code Review is Anthropicâs solution to the problem. The company notes that it is a more thorough option, albeit a more expensive one, compared to the open-source Claude Code GitHub Action, which also reviews code and remains available.
How Code Review works
âWhen a PR opens, Claude dispatches a team of agents to hunt for bugs,â the company said in an X post.
The agents then look for bugs in parallel, filter out false positives, and rank bugs by severity, Anthropic said in the blog post. The result lands on the PR as a single high-signal overview comment (a summary highlighting the most important findings), plus in-line comments (comments attached directly to the specific lines of code where bugs were found) for specific bugs.
âReviews scale with the PR. Large or complex changes get more agents and a deeper read; trivial ones get a lightweight pass. Based on our testing, the average review takes around 20 minutes,â Anthropic said in the blog post.
The system focuses on logic errors rather than style issues, giving developers actionable insights,Cat Wu, Anthropicâs head of product, told TechCrunch.
âThis is really important because a lot of developers have seen AI automated feedback before, and they get annoyed when itâs not immediately actionable,â Wu said. âWe decided weâre going to focus purely on logic errors. This way weâre catching the highest priority things to fix.â
The AI also explains its reasoning step by step, showing what it believes the issue is, why it could be a problem, and how it might be fixed, TechCrunch said. Issues are colour-coded by severity: red for the most serious, yellow for potential concerns worth checking, and purple for preexisting or historical bugs.
Results from testing
Anthropic said that it has been using Code Review internally for several months.
On large PRs (pull requests with over 1,000 lines changed), 84% show problems, averaging 7.5 issues. On small PRs (under 50 lines), only 31% show problems, averaging 0.5 issues. Engineers mostly agree with the results: less than 1% of findings are wrong, Anthropic said.
Cost and control
Code Review optimises for depth, which makes it more expensive than lighter-weight alternatives, including the Claude Code GitHub Action. Reviews are billed based on token usage, usually averaging $15â25 per PR, depending on its size and complexity.
Admins have multiple tools to manage costs and usage:
Availability
Code Review is available now as a research preview in beta for Team and Enterprise plans.
Peer feedback has long been essential in coding, helping developers catch errors, maintain consistency across a codebase, and improve overall software quality. At the same time, the rise of âvibe codingâ (AI tools that generate code from plain-language instructions) has accelerated development but also brought new bugs, security risks, and code that is hard to understand.
âCode review has become a bottleneck, and we hear the same from customers every week,â Anthropic said in a blog post. âThey tell us developers are stretched thin, and many PRs [pull request] get skims rather than deep reads.â
Pull requests are used by developers to submit code changes for review before the updates are merged into the main software.
Code Review is Anthropicâs solution to the problem. The company notes that it is a more thorough option, albeit a more expensive one, compared to the open-source Claude Code GitHub Action, which also reviews code and remains available.
How Code Review works
âWhen a PR opens, Claude dispatches a team of agents to hunt for bugs,â the company said in an X post.
The agents then look for bugs in parallel, filter out false positives, and rank bugs by severity, Anthropic said in the blog post. The result lands on the PR as a single high-signal overview comment (a summary highlighting the most important findings), plus in-line comments (comments attached directly to the specific lines of code where bugs were found) for specific bugs.
âReviews scale with the PR. Large or complex changes get more agents and a deeper read; trivial ones get a lightweight pass. Based on our testing, the average review takes around 20 minutes,â Anthropic said in the blog post.
The system focuses on logic errors rather than style issues, giving developers actionable insights,Cat Wu, Anthropicâs head of product, told TechCrunch.
âThis is really important because a lot of developers have seen AI automated feedback before, and they get annoyed when itâs not immediately actionable,â Wu said. âWe decided weâre going to focus purely on logic errors. This way weâre catching the highest priority things to fix.â
The AI also explains its reasoning step by step, showing what it believes the issue is, why it could be a problem, and how it might be fixed, TechCrunch said. Issues are colour-coded by severity: red for the most serious, yellow for potential concerns worth checking, and purple for preexisting or historical bugs.
Results from testing
Anthropic said that it has been using Code Review internally for several months.
On large PRs (pull requests with over 1,000 lines changed), 84% show problems, averaging 7.5 issues. On small PRs (under 50 lines), only 31% show problems, averaging 0.5 issues. Engineers mostly agree with the results: less than 1% of findings are wrong, Anthropic said.
Cost and control
Code Review optimises for depth, which makes it more expensive than lighter-weight alternatives, including the Claude Code GitHub Action. Reviews are billed based on token usage, usually averaging $15â25 per PR, depending on its size and complexity.
Admins have multiple tools to manage costs and usage:
- Monthly organisation caps: Set a total spend for all reviews in a month
- Repository-level control: Run reviews only on chosen repositories
- Analytics dashboard: Track which PRs were reviewed, acceptance rates, and total review costs
Availability
Code Review is available now as a research preview in beta for Team and Enterprise plans.
- For admins: Enable Code Review in Claude Code settings, install the GitHub App, and select the repositories you want to monitor.
- For developers: Once enabled, reviews run automatically on new PRs without additional setup.