Skip to main content

Team Productivity Metrics in the AI Era

5 min read
Eng Manager

Eng Manager

Lines of code and PR count are worse than useless now. Focus on outcomes.


Team Productivity Metrics in the AI Era

TL;DR

  • Old metrics (PRs, story points, lines of code) are worse than useless with AI. They'll inflate without meaning.
  • Shift to outcome metrics: cycle time, reliability, customer impact. AI should improve these, not just activity.
  • Qualitative signals matter more: code review quality, cross-team help, "who do people ask when stuck?"

You used to gauge productivity by PR volume, story points, or velocity. AI just made those metrics laughable. An engineer can now generate 10 PRs in a day — most of them trivial or wrong. Great numbers. Terrible signal.

Why Activity Metrics Fail

PRs and commits: AI generates both. A junior with Cursor can out-PR a senior. That doesn't mean they're more productive.

Story points: Teams game these anyway. AI lets you "complete" more points by automating boilerplate. Velocity goes up; value delivered may not.

Lines of code: The classic bad metric. More code often means more complexity, not more value. AI will happily write thousands of lines. You don't want to incentivize that.

What to Measure Instead

1. Cycle time (idea → shipped)

  • How long from "we should build X" to "X is live"?
  • AI should compress this. If it doesn't, something's wrong — maybe overload, maybe poor tool use.

2. Reliability and quality

  • Incident rate, rollback rate, bug escape rate.
  • AI can introduce bugs. Are you catching them? Are they increasing?

3. Customer and business impact

  • Features shipped that moved a metric. Fixes that reduced pain.
  • Harder to measure, but this is what actually matters.

4. Cross-team and organizational health

  • Who gets pulled into incidents? Who unblocks others?
  • AI doesn't help with coordination. If your "productivity" goes up but cross-team friction spikes, you've lost.

Qualitative Signals You Can't Ignore

  • Code review depth. Are reviews substantive or rubber-stamped? AI-generated code needs more scrutiny, not less.
  • "Who do people ask?" The people others go to for help are force multipliers. That value often doesn't show in metrics.
  • Burnout and attrition. If velocity goes up but people are miserable, you're borrowing from the future.

Avoiding the Metric Trap

Don't replace bad metrics with new bad metrics. "AI-assisted PRs" or "prompts run per day" are noise.

  • Measure outcomes, not activity. What shipped? What broke? What helped the business?
  • Use metrics to ask questions, not to judge. "Cycle time went up — why?" is useful. "Your cycle time is low — you're great" can incentivize cutting corners.
  • Combine quantitative with qualitative. Numbers tell part of the story. 1:1s and retros tell the rest.

Manual process. Repetitive tasks. Limited scale.

Click "With AI" to see the difference →

Quick Check

What remains human when AI automates more of this role?

Do This Next

  1. Audit your current metrics — Which ones would inflate or distort with AI? Deprecate or downweight them.
  2. Pick one outcome metric — Cycle time, incident rate, or customer impact. Start tracking it. Share with the team.
  3. Add one qualitative check — e.g., "In retros, we ask: Did AI help or hinder this sprint?" Aggregate and look for patterns.