Fine-tuning adapts a pre-trained model to a specific domain, style, or task by continuing training on a smaller curated dataset. It is the middle ground between using a model as-is (prompting) and training from scratch — much cheaper than pre-training but more expensive than RAG.

Key Points

  • Full Fine-Tuning: update all model weights — expensive, requires significant GPU memory
  • PEFT (Parameter-Efficient Fine-Tuning): update a small fraction of parameters, freeze the rest
  • LoRA (Low-Rank Adaptation): add small rank-decomposed matrices to attention layers; top PEFT method
  • QLoRA: quantise base model to 4-bit, apply LoRA — fine-tune a 70B model on a single GPU
  • Instruction Fine-Tuning: train on (prompt, ideal response) pairs to follow instructions
  • RLHF (Reinforcement Learning from Human Feedback): reward model + PPO to align with human preferences
  • Domain Adaptation: fine-tune on medical, legal, or financial text to improve domain knowledge
  • When to use RAG vs fine-tuning: RAG for dynamic/updatable knowledge; fine-tuning for style/behaviour
  • Catastrophic forgetting: fine-tuning can reduce general capabilities; use regularisation
  • Data quality > quantity: 1000 high-quality examples often beats 100K low-quality ones
ApproachCostUpdatableBest For
PromptingVery lowYesGeneral tasks, quick iteration
RAGLow-mediumYesDynamic knowledge, factual grounding
Fine-tuning (LoRA)MediumNoStyle, format, domain vocabulary
Full fine-tuningHighNoDeep domain adaptation
Pre-training from scratchVery highNoTruly proprietary architecture

Real-World Example

Meta's LLaMA models are released as open weights specifically so organisations can fine-tune them with LoRA for their domain. Bloomberg trained BloombergGPT from scratch on 700B tokens of financial data — but most companies achieve similar results fine-tuning LLaMA on their data at a fraction of the cost.