When Data Scientists Become Prompt Engineers

TL;DR

LLMs can generate code, explain results, and draft analyses. For exploration and prototyping, that's fast.
For production, reproducibility, and audit trails — you still need real code.
The shift: less "write Python from scratch," more "direct the AI, validate the output, own the pipeline."

It's uncomfortable. You spent years learning stats, Python, SQL. Now you can get a working analysis by chatting with a model. Some of that work is commoditized. The parts that remain valuable: asking the right question, validating the answer, and putting it in a form the business can trust.

What LLMs Are Good At (For Data Work)

Code generation:

"Write a pandas pipeline to aggregate X by Y" — Works. Saves time.
"Debug this error" — Often works. Copy-paste, run, iterate.

Exploratory analysis:

"What might be causing this correlation?" — LLM suggests hypotheses. You test them.
"Explain this chart" — LLM drafts narrative. You fact-check.

Documentation and reporting:

"Summarize these findings for stakeholders" — First draft in seconds. You edit.
"Create a README for this analysis" — Boilerplate done. You customize.

Learning and reference:

"How does X work in library Y?" — Faster than docs sometimes. Verify before using.

What You Still Own

Problem framing:

"What should we predict?" "What's the right metric?" — LLMs can suggest. You decide. The business context is yours.

Validation:

LLMs hallucinate. Code has bugs. Analyses have flaws. You're the skeptic.
Run the code. Check the numbers. Sanity-check the conclusions.

Reproducibility:

A prompt session isn't a pipeline. Production needs versioned code, dependencies, and audit trails.
LLM-generated code → you refactor, test, commit.

Interpretation and storytelling:

"What does this mean for the business?" — LLMs can draft. The nuance, the caveats, the "here's what we don't know" — that's you.

The New Workflow

Prompt for exploration — Fast iteration. Get to "does this make sense?" quickly.
Code for production — Refactor LLM output. Add tests. Version. Deploy.
Validate everything — Don't trust. Verify. Especially when the answer looks right.

Write Python from scratch. Debug. Document. Reproducible pipeline. Days per analysis.

Click "Data Work With LLM Assistance" to see the difference →

Quick Check

An LLM generated analysis code that produces a surprising result. What do you do?

Do This Next

Run one analysis twice — Once with pure coding. Once with LLM assistance. Compare time and quality. Where does the LLM help? Where does it hurt?
Define your "prompt vs. code" boundary — What's exploration (prompt OK)? What's production (code required)? Write it down.
Build a validation checklist — For any LLM-assisted analysis: What do you always verify? Add it to your process.