Skip to main content

Fine-Tuning vs. Training in the Foundation Model Era

5 min read
Ml EngAi Research

Ml Eng

Default to prompt or fine-tune. Train from scratch only when you have the data and the need.

Ai Research

Research still does novel architectures. Production usually doesn't.

Fine-Tuning vs. Training in the Foundation Model Era

TL;DR

  • Most problems: prompt or fine-tune. Training from scratch is for edge cases.
  • Fine-tuning is faster and cheaper. You need enough quality data and a clear task.
  • Training from scratch: when the foundation doesn't fit (domain, modality, constraints) and you have the resources.

The old playbook was "collect data, train a model." The new one is "try the foundation model first, then fine-tune if needed, train only if necessary." Most teams never get past step 2.

The Decision Tree

Prompt first:

  • Can you get 80% there with a good prompt? Try it. No training. No infra. Iterate fast.
  • Use for: classification, extraction, summarization, Q&A. Often enough.

Fine-tune when:

  • Prompting isn't good enough. You have 100s–1000s of labeled examples. The task is consistent.
  • Fine-tuning adapts the foundation to your domain. Fewer examples than training from scratch. Faster iteration.

Train from scratch when:

  • No foundation model fits (exotic modality, strict latency, on-device, regulatory).
  • You have large-scale data and the budget for compute and expertise.
  • Rare. Most teams shouldn't start here.

Fine-Tuning: What You Need

Data:

  • Quality > quantity. 500 good examples beat 50K noisy ones.
  • Format: input-output pairs. Consistent task definition.

Compute:

  • LoRA, QLoRA reduce cost. Fine-tuning is cheaper than training. Still not free.
  • Cloud or on-prem. You provision.

Evaluation:

  • How do you know it worked? Holdout set, production metrics. Define before you tune.

The Pitfalls

  • Overfitting — Small data + aggressive tuning = model that memorizes. Use validation. Early stopping.
  • Catastrophic forgetting — Fine-tuning can hurt general ability. Monitor both task performance and base capabilities.
  • Vendor lock-in — Fine-tuning via API ties you to a provider. Fine-tune open weights if portability matters.

Collect data. Train from scratch. Months of compute. One model per task.

Click "ML Development With Foundation Models" to see the difference →

Quick Check

You need a model for a domain-specific classification task. What's the right order?

Do This Next

  1. Map your use cases — For each: prompt-only, fine-tune, or train? Document the decision and the threshold.
  2. Run one fine-tuning experiment — Pick a task. Get 100+ examples. Fine-tune a small model. Compare to prompt-only. Document the lift.
  3. Define your "train from scratch" criteria — When would you? List the conditions. Makes the default (don't) explicit.