State of Ai Engineering

Sri Rang ~ sri.r@qodo.ai ~ linkedin ~ 🏠

Governing Ai Engineering

References 1/2

References 2/2

Facts

Fact 01 β€” Universal AI Coding Adoption

  • 74% of developers worldwide have adopted specialized AI coding tools
    • As of Jan 2026
  • 41% of all code is now AI-generated or AI-assisted
  • 90% of Fortune 100 companies use AI coding tools
  • The average dev now checks in 75% more code than they did in 2022
    • Source: GitClear

Fact 02 β€” Code Review Bottleneck

  • Teams with high AI adoption merge 98% more PRs, but:
    • PR review time has jumped 91%.
  • PR sizes are up 154%; bugs up 9%
    • DORA delivery metrics unchanged across 10,000+ devs
    • aka. "AI Productivity Paradox"
  • 44% of teams name slow code reviews as their single biggest delivery bottleneck.
  • "More code, fewer releases" β€” Waydev's named blind spot of 2026.

Fact 03 β€” Quality & Security Risk

  • AI-generated code has 2.74x higher vulnerability density than human-written code
  • 45% of AI-generated code samples failed security tests
    • Java: 72% security failure rate β€” Python, C#, JS: 38–45%.
  • AI-generated code adds 10,000+ new security findings per month
    • 10x jump from Dec 2024 to June 2025.
  • Refactoring rate collapsed from 25% to under 10%
    • Code duplication 4x'd
    • GitClear, 211M lines analyzed.

Fact 04 β€” Engineering Leaders Budget

  • ~50% of engineering leaders set aside 1–3% of total budget for AI tools
  • Current spend:
    • $101–500 per developer/year on AI dev tools (38.4% of leaders)
    • $1,000/dev/year is the emerging 2026 target
  • 85.7% of leaders are reserving 2026 budget for
    • AI tools "beyond code authoring"
    • Code Review, Governance, Security, Planning etc.
    • 15–20% of AI tooling budget is being earmarked for adjacent use cases
  • 86% of leaders feel uncertain which AI tools deliver the most ROI

Diagnostics

Your State of Ai Engineering

  1. "Universal AI Coding Adoption"
  2. "Code Review Bottleneck"
  3. "Quality & Security Risk"
  4. "Engineering Leaders Budget"

"Universal AI Coding Adoption"

  • Which AI coding tools are your devs using β€” sanctioned or otherwise?
  • What percentage of your code is AI generated?
  • How has PR volume per developer changed in the last 12–18 months?
  • Are some teams further along than others?
    • Which ones moved first, and why?

"Code Review Bottleneck"

  • Typical PR-cycle at your Org. β€” from open to merge.
    • Where do PRs get stuck the longest?
    • What's your average time-to-first-review? Time-to-merge?
  • How many reviewers does a typical PR need?
    • How often does it need a senior/staff/principal dev?
    • What % of your engineering time go into review vs. building?
  • Has PR size grown in the last 12-18 months?
    • Have your DORA metrics improved since AI adoption?

If a PR sits 2 days waiting on review, what does that cost you in throughput?

"Quality & Security Risk"

  • Have you seen incidents or near-misses traced back to AI-generated code?
  • How do you catch security issues today β€” pre-merge, post-merge, or both?
    • What's your bug escape rate look like β€” pre-AI vs. now?
    • How much time does your AppSec team spend on findings?
    • Has rework or revert volume changed in the 12-18 months?
  • Is anyone tracking duplication or refactoring discipline?

When AI-generated code introduces vulnerabilities, who catches it β€” and how late in the cycle?

"Engineering Leaders Budget"

  • How are you thinking about AI tooling spend in 2026 vs. 2025?
    • Beyond code authoring, what other use cases are you exploring?
  • What's your per-dev annual spend on AI tools today?
  • How are you measuring ROI on the AI tools you've already deployed?

Qodo Architecture

sequenceDiagram box actor Developer participant Git.Platform as Git Platform participant Planning.Tool as JIRA / Linear / AzDO end box Purple participant Qodo.Code.Review as Code Review participant Qodo.Rules.Engine@{ "type" : "collections" } as Rules Engine participant Qodo.Context.Engine@{ "type" : "collections" } as Context Engine end Developer->>Git.Platform: Commits feature branch Developer->>Git.Platform: Creates new PR Git.Platform->>Qodo.Code.Review: PR ready-for-review Qodo.Code.Review-->>Git.Platform: Fetch code Note over Git.Platform,Qodo.Code.Review: Shallow clone of the feature branch. Qodo.Code.Review-->>Planning.Tool: Fetch issue/ticket for this feature. Note over Planning.Tool,Qodo.Code.Review: Extract acceptance criteria from specifications Qodo.Code.Review-->>Qodo.Rules.Engine: Fetch rules and review-guidelines Note over Qodo.Rules.Engine,Qodo.Code.Review: Pulls team, project, org defined review rules Qodo.Code.Review-->>Qodo.Context.Engine: Fetch additional context Note over Qodo.Context.Engine,Qodo.Code.Review: Additional context from related projects and PR history loop Qodo.Context.Engine-->Git.Platform: Continuous indexing of repos and PRs - background process end Qodo.Code.Review-->>Git.Platform: Action Required, Review Recommended Note over Qodo.Code.Review,Git.Platform: Publishes review as PR comment Git.Platform-->>Developer: Review available notification

Deployment Options

  1. Multi-tenant SaaS
    • Hosted in πŸ‡ΊπŸ‡Έ
  2. Dedicated, single-tenant SaaS
    • Hosted in πŸ‡ͺπŸ‡Ί, or any region of your choice
  3. On-Prem / Cloud-Prem
    • Hosted on your Kubernetes cluster
    • Optionally, Bring-Your-Own-Keys
  4. Air-Gapped
    • Your GPUs and Data-Center

Benchmarks

Open-source, Peer-reviewed

Code Review Benchmarks

Benchmarks β€” Overview

  • Largest open-source, code-review benchmarks
    • Transparent, peer-reviewed inputs & methodology
  • Conducted from mid-Jan to early-Feb 2026
    • All tools had the latest models
    • All tools had default configurations
    • Zero methodology bias

Benchmarks β€” DataSet & Contestants

  • Top open-source repos
  • 100 real, merged PRs
  • 580 human-verified issues
  • cal.com TypeScript
  • Ghost JavaScript
  • dify Python, Go
  • firefox-ios Swift
  • prefect Python
  • tauri Rust
  • aspnetcore C#
  • redis C
  • Qodo
  • Augment
  • Copilot
  • Cursor
  • Greptile
  • Codex
  • Coderabbit
  • Sentry

Benchmarks β€” Yardstick

  • Precision "When I flag something, am I right?"
  • Recall "Did I catch everything?"
  • F1 Score Harmonic mean of Precision & Recall

Benchmarks β€” Results

github.com/agentic-review-benchmarks/benchmark-pr-mapping

Agent Precision (%) Recall (%) F1 (%)
Qodo - Exhaustive 63.8 56.7 60.1
Qodo - Precise 74.5 44.2 55.4
Augment 70.6 32.1 44.1
Copilot 50.1 37.4 42.8
Cursor 78.5 26.2 39.3
Greptile 68.5 27.2 39.0
Codex 83.0 24.3 37.6
Coderabbit 53.7 19.0 28.0
Sentry 85.3 13.8 23.7

Build vs. Buy

Build Options

  • Build Option 1 β€” Platform team builds and maintains centralized review agent for entire Org
  • Build Option 2 β€” Each team builds and maintains their own, custom review agent

 

  • Qodo β€” Benchmark-proven, Enterprise ready, SOTA review agent

Core Review Capabilities

Dimension Build: Central Review Agent Build: Custom per Team Buy: Qodo
Precision & Recall Unknown Unknown, Inconsistent Benchmark-proven, SOTA
Context Engine Must build Doesn't exist Benchmark-proven, SOTA
Rules Engine Must build Doesn't exist Included
Temporal learning (PR history) Doesn't exist Doesn't exist Included

Operations & Governance

Dimension Build: Central Review Agent Build: Custom per Team Buy: Qodo
Metrics Separate system to build Doesn't exist Included
LLM cost monitoring Separate system to build Doesn't exist Included
Enterprise plumbing (SOC 2, SSO etc.) Must build Multiplied risk per team Included
Engg. leader visibility Eventually Zero Day one

Risk

Dimension Build: Central Review Agent Build: Custom per Team Buy: Qodo
Maintenance burden You, centralized You, fragmented Qodo
Opportunity cost Engineers off product roadmap Engineers off product roadmap Focus on product roadmap
Failure mode Slow, political, single point of failure Fragmented, duplicative, inconsistent Qodo

Cost

Dimension Build: Central Review Agent Build: Custom per Team Buy: Qodo
LLM API spend High β€” but at least centralized Multiplies across teams Included; lower per-PR

Speed and Coverage

Dimension Build: Central Review Agent Build: Custom per Team Buy: Qodo
Time to value Months Weeks, never org-wide Days
Org-wide consistency High None High
Project-level customization Slow β€” gated by central team Native Custom rules