Richard Batt |
AI-Assisted Code Review: How I Use It and Where It Falls Short
Tags: Development, AI Tools
I watched an AI code review tool catch a race condition that three human reviewers missed. It was subtle, a shared variable being modified in a callback without proper locking, the kind of thing that would manifest as an intermittent bug in production. The AI flagged it immediately. One week later, the same AI suggested that a team should completely rewrite a working authentication system for reasons that made no sense when you actually understood the security requirements. The suggestion would have introduced a critical vulnerability.
Key Takeaways
- What AI Code Review Actually Catches.
- Where AI Code Review Fails Badly.
- The Right Way to Use AI in Code Review, apply this before building anything.
- The Technology Stack, apply this before building anything.
- Specific Scenarios Where AI Really Works.
This is the reality of using AI for code review. It's not binary. It's not "AI is brilliant" or "AI is useless". It's more careful, and understanding where it works and where it fails is essential if you're going to use it effectively.
I've integrated AI code review into workflows across 120+ projects, and I've seen it consistently improve code quality when used correctly and degrade it badly when used incorrectly. Here's what actually works, what doesn't, and why.
What AI Code Review Actually Catches
Let me start with where the value is real. AI code review tools, I primarily use Claude and GPT-4, though I've also worked with Github Copilot and other tools, are exceptional at specific categories of issues.
First, they catch inconsistency at scale. If a codebase has established patterns for error handling, logging, or API responses, an AI reviewer will identify deviations quickly. I worked with a JavaScript team that had established a pattern for async error handling across 30+ API endpoints. When a new developer added an endpoint without following the pattern, the AI flagged it immediately, alongside suggestions for how to align it with the established approach. A human reviewer would probably have missed this on the first pass.
Second, they're excellent at security vulnerabilities of specific types. SQL injection, improper credential handling, unvalidated user input, these are well-known vulnerability patterns that AI has been trained on extensively. A financial services team I advised found that AI code review caught roughly 65 percent of security issues before they hit staging. Not 100 percent, but 65 percent is meaningful.
Third, they're good at performance anti-patterns. Nested loops that should be flattened, unnecessary object allocations, inefficient queries, if the pattern is recognisable, the AI catches it. I reviewed code from a Python team where the AI identified that a developer had written a function that was iterating through a dictionary 15 times when a single pass would accomplish the same thing. The performance impact was meaningful, the function was being called hundreds of times per hour.
Fourth, and this is underrated: they catch basic typing issues and null reference errors. If you're using TypeScript or a similar language, the AI can identify when you're doing something that would crash at runtime. "You're accessing property X on an object that might be null," or "This function signature doesn't match how you're calling it." These are the kinds of errors that slip through because they're not visually obvious.
Fifth, they're genuinely good at identifying dead code, unused imports, and other code hygiene issues. A consulting team I worked with had accumulated hundreds of lines of unused utility functions over two years of development. The AI identified all of them in a single pass. A human reviewer would have taken hours to spot those.
Where AI Code Review Fails Badly
Now the critical part: where it doesn't work, and why you need humans to stay in the loop.
First, the AI cannot understand domain-specific context. I worked with a financial services team that had a specific approach to calculating exchange rates. It was unorthodox, they were deliberately using a slightly imprecise calculation to avoid edge cases with certain currencies. The code looked wrong to anyone who didn't understand the business requirement. The AI looked at it and flagged it as a bug, recommending a mathematically more correct approach that would actually break their system. The developer would have needed to override the AI suggestion and explain why the domain requirement was different from the obvious technical solution.
Second, the AI struggles with architectural decisions. If code is well-written but represents a choice that's not optimal for the system, the AI often won't catch it. I reviewed code from a team building an event-driven system. A developer had written a function that did everything synchronously, which worked but was architecturally wrong for a system that needed to handle millions of events. The AI didn't flag it as wrong, the code was clean and functional. It just didn't match the architectural pattern the system needed to follow.
Third, and this is crucial, the AI cannot judge whether code is appropriate for the actual requirements. I reviewed a pull request where the AI suggested a highly optimised algorithm for sorting data. The suggestion was technically brilliant. But the code was run once per day on a small dataset where performance literally didn't matter. The developer didn't need optimisation, they needed clarity. The AI suggestion would have made the code harder to understand without any meaningful benefit.
Fourth, the AI is vulnerable to hallucinating requirements that don't exist. I've seen AI code reviewers suggest functionality that sounds good but isn't actually needed. "You should add caching here," suggests the AI. But there's no evidence that caching is necessary. The code will be called infrequently. Adding caching adds complexity and maintenance burden for no real benefit. The AI sounds authoritative, but it's actually suggesting unnecessary work.
Fifth, and this is a serious issue: the AI sometimes gets security recommendations badly wrong. I worked with a team where the AI suggested moving API credentials into environment variables. Reasonable on the surface. But in their specific deployment context, environment variables were being logged to a central system that multiple people could access. The suggestion would have created a security vulnerability. A security-trained human would have caught this. The AI didn't understand the full context of how the environment variables would be handled.
Sixth, the AI cannot evaluate code maintainability across time. It might look at code and say, "This is complex," but it doesn't know whether that complexity is necessary or whether the developer will be able to maintain it six months from now when they've forgotten why those decisions were made. I reviewed code where the AI flagged something as overly complex. The developer's response: "It's complex because the requirements are complex. There's no simpler way to do this without losing functionality." The AI was technically right that it was complex, but wrong in the implicit suggestion that it should be simplified.
The Right Way to Use AI in Code Review
Treating AI as an automated code reviewer that replaces humans is a mistake. Treating it as a tool that human reviewers use is much more effective.
Here's the workflow I recommend and have implemented across multiple teams: every pull request runs through an AI reviewer first. The AI does a quick automated pass and flags potential issues. Then, a human reviewer receives the pull request with AI annotations already included. The human reviewer's job is to evaluate the AI flags, decide which ones are valid, and do a deeper review focusing on the things only humans can evaluate, architecture, business logic, domain requirements, maintainability.
The benefit is that the human reviewer is no longer doing the tedious work of checking for simple errors. They're not spending time looking for SQL injection vulnerabilities or dead code. The AI did that. So the human can focus on the 20 percent of issues that actually matter, is this the right architectural approach? Does this code make sense given the business requirements? Is this maintainable?
I implemented this for a software team of 12 developers. Before: code reviews were taking roughly 45 minutes per pull request, and they were catching about 70 percent of actual issues. After: code reviews were taking roughly 20 minutes per pull request, and they were catching about 90 percent of actual issues. The AI handled the tedious checking. The humans focused on the complex evaluation.
The second practice is to give the AI specific context. than just feeding it the code, provide it with: the architectural documentation of the system, the coding standards the team follows, the business requirements that the code is supposed to fulfil, and any constraints on performance or security. With this context, the AI is much better at understanding whether something is actually a problem.
I worked with a team where they started using AI code review with just the code. The AI flagged lots of things that weren't actually issues given the system context. Six months later, they updated their process to provide architectural context to the AI. False positives dropped by 75 percent.
The third practice is to have strong escalation paths. If a developer disagrees with an AI suggestion, there should be a clear process for flagging it and having it reviewed by the team lead or an architect. I've seen teams where developers feel compelled to follow AI suggestions even when they know the suggestions are wrong. That's worse than not using AI at all, it's adding process that makes code worse.
The Technology Stack
What tools should you actually use for AI-assisted code review?
For integrated solutions, GitHub Copilot (which is also available as a VS Code extension) is the most straightforward. You enable it in your GitHub workflow, and it automatically reviews pull requests. Cost is roughly £10 per user per month, though it's free for open source projects. The quality is good for catching obvious issues, but it's less sophisticated than feeding the code to a larger model like Claude or GPT-4.
For more sophisticated analysis, I build workflows using Claude or GPT-4 directly. I write a custom prompt that specifies exactly what I want the model to evaluate, and I feed it the code plus any relevant context. Cost varies, it depends on code size, but typically £0.20 to £1.00 per pull request. The quality is higher because you can be specific about what you're looking for.
A few teams I've worked with have built custom solutions using open-source models like Mistral or Llama running on their own infrastructure. This is more effort to set up but gives you full control and can be more cost-effective at scale.
The key decision: do you want AI reviewing code before human review or after? My recommendation is before. The AI does the tedious work first, then humans do the sophisticated work. But some teams prefer the opposite, have humans review first, then use AI to catch anything the humans missed. Both approaches work. The critical thing is integrating it into your workflow in a way that doesn't add friction.
Specific Scenarios Where AI Really Works
There are specific project types where AI code review is particularly valuable.
First, teams with junior developers. I worked with a startup that had three senior developers and five junior developers. The seniors were doing all the code reviews, which was a bottleneck. I implemented AI code review, which let the junior developers get immediate feedback on obvious issues, naming conventions, security vulnerabilities, basic anti-patterns. The senior developers could then focus on mentoring and architectural feedback. The junior developers got faster feedback and learned faster.
Second, teams working with rapidly evolving codebases. If you're refactoring constantly or adding new frameworks and patterns, the AI is useful at catching inconsistencies and deviations from the new patterns. I worked with a team migrating from Vue 2 to Vue 3. The AI caught dozens of instances where developers forgot to update patterns from the old framework.
Third, teams with security-heavy requirements. If you're in financial services or healthcare, the AI catches many common security mistakes. Not all of them, you still need security experts, but it catches enough that you can scale your human security review effort.
Fourth, teams that are distributed across time zones. If your code reviews are blocking development because reviewers are asleep in different time zones, AI can do an initial pass while you wait for humans. It won't replace human review, but it can move things forward faster.
The Things I Still Get Wrong
After four years of implementing AI code review, there are still things I misjudge.
I sometimes overestimate how much time AI will save on code review. It saves time on tedious checking, but if your team's bottleneck is senior developers having time for architectural review, AI doesn't solve that. It just makes the tedious part faster.
I sometimes get the context wrong. I'll set up an AI reviewer without providing enough architectural or business context, and it produces noisy feedback that developers have to parse through. It requires effort to set up properly.
I occasionally underestimate how important it is to have your team agree on what the AI should be evaluating. If the team hasn't agreed on coding standards, the AI can't enforce them consistently. It's a team alignment problem, not a tool problem.
The Future of AI and Code Review
I think the trajectory is clear: AI will get better at understanding code context, will integrate more deeply into development workflows, and will handle more sophisticated architectural analysis. But I don't think AI will replace human code review in the foreseeable future. What I think will happen is that human code review will become more sophisticated and focused, less time checking for obvious errors, more time thinking about architecture and requirements.
The teams that will do well are the ones that use AI to handle the tedious parts and redeploy human effort toward the parts that require judgment and context. The teams that will struggle are the ones treating AI as a replacement for human review or trying to use it without integrating it properly into their workflow.
Frequently Asked Questions
How long does it take to build AI automation in a small business?
Most single-process automations take 1-5 days to build and start delivering ROI within 30-90 days. Complex multi-system integrations take 2-8 weeks. The key is starting with one well-defined process, proving the value, then expanding.
Do I need technical skills to automate business processes?
Not for most automations. Tools like Zapier, Make.com, and N8N use visual builders that require no coding. About 80% of small business automation can be done without a developer. For the remaining 20%, you need someone comfortable with APIs and basic scripting.
Where should a business start with AI implementation?
Start with a process audit. Identify tasks that are high-volume, rule-based, and time-consuming. The best first automation is one that saves measurable time within 30 days. Across 120+ projects, the highest-ROI starting points are usually customer onboarding, invoice processing, and report generation.
How do I calculate ROI on an AI investment?
Measure the hours spent on the process before automation, multiply by fully loaded hourly cost, then subtract the tool cost. Most small business automations cost £50-500/month and save 5-20 hours per week. That typically means 300-1000% ROI in year one.
Which AI tools are best for business use in 2026?
It depends on the use case. For content and communication, Claude and ChatGPT lead. For data analysis, Gemini and GPT work well with spreadsheets. For automation, Zapier, Make.com, and N8N connect AI to your existing tools. The best tool is the one your team will actually use and maintain.
Put This Into Practice
I use versions of these approaches with my clients every week. The full templates, prompts, and implementation guides, covering the edge cases and variations you will hit in practice, are available inside the AI Ops Vault. It is your AI department for $97/month.
Want a personalised implementation plan first? Book your AI Roadmap session and I will map the fastest path from where you are now to working AI automation.