AI Test Generation: Honest Assessment

TL;DR

AI can crank out test cases fast. They're often shallow: happy path, obvious edge cases, generic scenarios.
AI misses domain logic, business rules, and "what would a user actually do wrong?" You catch those.
Use AI for first drafts and regression expansion. You own the strategy and the tricky cases.

If you've run an AI test generator, you've seen it: hundreds of test cases, many of them useless. A few are golden. Your job is knowing the difference.

What AI Produces

Happy path coverage. "User logs in, navigates to X, does Y." Fine for smoke tests. Doesn't find bugs.
Obvious edge cases. Empty input, null, max length. AI has seen these in training data. Useful but not sufficient.
Vague scenarios. "Test the checkout flow." What flow? Which payment? Which user type? AI needs heavy prompting.
Brittle automation. AI-generated Selenium or Playwright scripts that break on first UI change. High maintenance.

What AI Misses

Domain logic. "In our app, discounts don't stack. If you have BOGO and a coupon, the system picks one." AI doesn't know your product.
Business rules. Compliance, eligibility, entitlement. Buried in docs or tribal knowledge. AI can't infer.
Integration quirks. "When the payment gateway times out, we show a generic error but don't charge." AI tests the happy path; you test the failure modes.
User behavior. Real users do weird things. AI optimizes for "sensible" inputs. Bugs often live in the weird.

How to Use AI-Generated Tests

Accept: Use AI for brainstorming. "Give me 20 test cases for a login form." Take the 5 that make sense. Discard the rest.

Adapt: AI output is a starting point. Add your domain knowledge. "Also test: locked account after 5 fails, SSO redirect, session expiry at 15 min."

Maintain: AI-generated automation is fragile. Prefer low-level API tests over UI for AI output. Or use AI for test cases (scenarios) and write the code yourself.

AI Disruption Risk for QA Engineers

Moderate Risk

SafeCritical

AI automates routine work. Strategy, judgment, and human touch remain essential. Moderate risk for those who own the outcomes.

Manual process. Repetitive tasks. Limited scale.

Click "With AI" to see the difference →

Quick Check

What remains human when AI automates more of this role?

Do This Next

Run an AI test generator on one feature you know well. Count: how many output cases would actually find a bug? How many are noise? That ratio is your reality check.
Create a prompt template for AI test gen: include your app's domain, key business rules, and "don't include" list (e.g., "don't test obvious null/empty—we have those"). Reuse it.