AI-Generated Test Suites

TL;DR

AI can turn a spec (user story, API doc, PRD) into test cases and even executable tests. Fast.
Specs are the bottleneck. Garbage spec = garbage tests. AI amplifies whatever you give it.
Use AI to expand coverage from good specs. Don't use it to create a suite from a one-line ticket.

"Generate tests from this spec" sounds like a silver bullet. It works when the spec is solid. When the spec is vague, the tests are too.

What AI Does Well

Structured spec → tests. Given a clear spec with inputs, outputs, and edge cases, AI produces decent coverage. API specs, OpenAPI docs, behavior descriptions.
Test scaffolding. Page objects, fixtures, setup/teardown. AI writes boilerplate. You add the nuanced assertions.
Expansion. "We have 10 test cases. Add 10 more that cover edge cases." AI suggests; you filter.
Maintenance. Spec changed? AI can propose test updates. You verify they're correct.

Vague specs. "User can checkout." AI generates generic tests. They might miss: guest vs. logged-in, payment methods, tax, shipping rules. The spec didn't say.
Implicit requirements. "Everyone knows we don't allow backorders." The spec doesn't. AI doesn't.
Integration context. Tests run in isolation. AI doesn't know: "this fails if the cart service is down" or "we need test data in state X."
Framework and pattern fit. AI might generate tests that don't match your page object model, your assertion style, or your CI setup. Refactoring required.

Input quality matters.

Feed AI detailed specs: acceptance criteria, examples, error cases.
Include your test framework and patterns in the prompt. "Generate Playwright tests using our BasePage pattern."

Output is a draft.

Iterate.

Start with one feature. Generate. Refine. Use as a template for prompting other features.
Build a library of good prompts. "Generate tests for [type] with [constraints]." Reuse.

Manual process. Repetitive tasks. Limited scale.

Click "With AI" to see the difference →

Quick Check

What remains human when AI automates more of this role?

Take one well-specified feature (good AC, examples, edge cases). Run it through an AI test generator. Count: how many tests are useful as-is? How many need heavy editing? That's your baseline.
Create a spec template for features that will get AI-generated tests. Include: inputs, outputs, error cases, and "out of scope." Better input = better output.