Fine-Tuning
Full fine-tuning, LoRA, PEFT, instruction tuning, RLHF alignment
Fine-tuning adapts a pre-trained model to a specific domain, style, or task by continuing training on a smaller curated dataset. It is the middle ground between using a model as-is (prompting) and training from scratch — much cheaper than pre-training but more expensive than RAG.
Key Points
- Full Fine-Tuning: update all model weights — expensive, requires significant GPU memory
- PEFT (Parameter-Efficient Fine-Tuning): update a small fraction of parameters, freeze the rest
- LoRA (Low-Rank Adaptation): add small rank-decomposed matrices to attention layers; top PEFT method
- QLoRA: quantise base model to 4-bit, apply LoRA — fine-tune a 70B model on a single GPU
- Instruction Fine-Tuning: train on (prompt, ideal response) pairs to follow instructions
- RLHF (Reinforcement Learning from Human Feedback): reward model + PPO to align with human preferences
- Domain Adaptation: fine-tune on medical, legal, or financial text to improve domain knowledge
- When to use RAG vs fine-tuning: RAG for dynamic/updatable knowledge; fine-tuning for style/behaviour
- Catastrophic forgetting: fine-tuning can reduce general capabilities; use regularisation
- Data quality > quantity: 1000 high-quality examples often beats 100K low-quality ones
| Approach | Cost | Updatable | Best For |
|---|---|---|---|
| Prompting | Very low | Yes | General tasks, quick iteration |
| RAG | Low-medium | Yes | Dynamic knowledge, factual grounding |
| Fine-tuning (LoRA) | Medium | No | Style, format, domain vocabulary |
| Full fine-tuning | High | No | Deep domain adaptation |
| Pre-training from scratch | Very high | No | Truly proprietary architecture |
Real-World Example
Meta's LLaMA models are released as open weights specifically so organisations can fine-tune them with LoRA for their domain. Bloomberg trained BloombergGPT from scratch on 700B tokens of financial data — but most companies achieve similar results fine-tuning LLaMA on their data at a fraction of the cost.