AI-Generated Test Suites

TL;DR

AI can turn a spec (user story, API doc, PRD) into test cases and even executable tests. Fast.
Research: GenIA-E2ETest shows ~77% completeness/correctness, ~82% execution precision, ~10% manual modification on average. Playwright MCP explores apps from high-level intent—minimal manual scripting.
Specs are the bottleneck. Garbage spec = garbage tests. Use AI to expand coverage from good specs.

"Generate tests from this spec" sounds like a silver bullet. It works when the spec is solid. When the spec is vague, the tests are too. Fun theory: TestForge-style agentic frameworks hit ~84.3% pass rate, ~44.4% line coverage, ~$0.63/file—human review still needed for critical flows.

What AI Does Well

Structured spec → tests. Given a clear spec with inputs, outputs, and edge cases, AI produces decent coverage. API specs, OpenAPI docs, behavior descriptions.
Test scaffolding. Page objects, fixtures, setup/teardown. AI writes boilerplate. You add the nuanced assertions.
Expansion. "We have 10 test cases. Add 10 more that cover edge cases." AI suggests; you filter.
Maintenance. Spec changed? AI can propose test updates. You verify they're correct.

What AI Struggles With

Vague specs. "User can checkout." AI generates generic tests. They might miss: guest vs. logged-in, payment methods, tax, shipping rules. The spec didn't say.
Implicit requirements. "Everyone knows we don't allow backorders." The spec doesn't. AI doesn't.
Integration context. Tests run in isolation. AI doesn't know: "this fails if the cart service is down" or "we need test data in state X."
Framework and pattern fit. AI might generate tests that don't match your page object model, your assertion style, or your CI setup. Refactoring required.

How to Use It

Input quality matters.

Feed AI detailed specs: acceptance criteria, examples, error cases.
Include your test framework and patterns in the prompt. "Generate Playwright tests using our BasePage pattern."

Output is a draft.

Review every generated test. Adjust assertions, add waits, fix selectors.
Run in CI. Fix flake. AI doesn't know your app's timing.

Iterate.

Start with one feature. Generate. Refine. Use as a template for prompting other features.
Build a library of good prompts. "Generate tests for [type] with [constraints]." Reuse.

Manual test authoring from specs. Slow coverage expansion. Days to build a suite.

Click "With AI" to see the difference →

Quick Check

Why does AI-generated test quality depend on spec quality?

Do This Next

Take one well-specified feature (good AC, examples, edge cases). Run it through an AI test generator. Count: how many tests are useful as-is? How many need heavy editing? That's your baseline.
Create a spec template for features that will get AI-generated tests. Include: inputs, outputs, error cases, and "out of scope." Better input = better output.