Skip to main content

AI-Assisted Data Modeling

5 min read
Data ArchData Eng

Data Arch

AI generates options. You validate against business rules and existing systems.

Data Eng

Use AI for first-pass schemas. You handle partitioning, indexing, and evolution.

AI-Assisted Data Modeling

TL;DR

  • AI can propose schemas from descriptions, suggest normal forms, and draft DDL.
  • AI doesn't know your existing data landscape, business rules, or migration constraints.
  • Use AI to explore the design space. You lock in the model.

"Design a schema for an e-commerce orders system." AI will produce something plausible — maybe even good. But it won't know that your orders already exist in a legacy system, that "customer" means something specific in your org, or that you're constrained by a 10-year-old data warehouse you can't replace. You do.

What AI Does Well

First drafts from requirements:

  • Describe the domain. AI proposes tables, keys, relationships. Iterate.
  • Faster than staring at a blank ERD. Use it as a starting point.

Normalization and denormalization trade-offs:

  • "Should this be normalized?" — AI lists pros and cons. You decide based on query patterns.
  • "We need faster reads." — AI suggests denormalization strategies. You validate against write complexity.

DDL and migration scripts:

  • "Generate CREATE TABLE for this model." — AI produces SQL. You adjust types, constraints, and indexes for your engine.

Documentation:

  • "Document this schema." — AI drafts column descriptions and relationships. You correct for accuracy.

What AI Gets Wrong

Domain semantics:

  • "Customer" might mean "account" or "contact" or "buyer" in your world. AI assumes textbook definitions.
  • Edge cases: refunds, multi-currency, time zones. AI often oversimplifies.

Existing systems:

  • AI designs in a vacuum. You have legacy tables, naming conventions, and systems that consume this data. Compatibility matters.

Performance implications:

  • Partitioning, indexing, sharding — AI can suggest. It doesn't know your query mix or scale. You need to tune.

Evolution and migration:

  • "How do we add this column without downtime?" — AI can propose. Real migration paths depend on your tooling and tolerance for risk.

The Workflow

  1. You provide context — Domain, existing systems, constraints, scale.
  2. AI proposes a model — Tables, keys, relationships.
  3. You critique — What's missing? What conflicts with reality?
  4. You finalize — Schema is yours. AI was a collaborator.

AI Disruption Risk for Data Architects

Moderate Risk

SafeCritical

AI proposes schemas fast. Domain semantics, existing systems, and migration constraints need human validation. Moderate risk for implementers; low for those who own the model.

Stare at blank ERD. Manually draft tables, keys, DDL. Hours per model.

Click "AI-Assisted Schema Design" to see the difference →

-- AI output: generic e-commerce customer
CREATE TABLE customer (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255) UNIQUE
);

-- Your version: org semantics + existing conventions
CREATE TABLE buyer (  -- "customer" = "buyer" in your domain
buyer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
display_name VARCHAR(100),  -- not "name" — matches legacy
email_canonical VARCHAR(320) UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
-- AI drafts; you apply domain rules and naming conventions.

Quick Check

What does AI often get wrong when generating schemas?

Do This Next

  1. Describe one domain you're modeling (or remodeling). Ask AI for a schema. Compare to what you'd design. Where does AI diverge? Why?
  2. Build a schema review checklist — Existing systems, naming, partitioning, indexes. Run every AI-generated schema through it.
  3. Document one migration — Adding a column, splitting a table. Use AI to draft the steps. You verify and document the rollback.