Fine-Tuning | Generative AI | AI / ML

Fine-tuning adapts a pre-trained model to a specific domain, style, or task by continuing training on a smaller curated dataset. It is the middle ground between using a model as-is (prompting) and training from scratch — much cheaper than pre-training but more expensive than RAG.

Key Points

Full Fine-Tuning: update all model weights — expensive, requires significant GPU memory
PEFT (Parameter-Efficient Fine-Tuning): update a small fraction of parameters, freeze the rest
LoRA (Low-Rank Adaptation): add small rank-decomposed matrices to attention layers; top PEFT method
QLoRA: quantise base model to 4-bit, apply LoRA — fine-tune a 70B model on a single GPU
Instruction Fine-Tuning: train on (prompt, ideal response) pairs to follow instructions
RLHF (Reinforcement Learning from Human Feedback): reward model + PPO to align with human preferences
Domain Adaptation: fine-tune on medical, legal, or financial text to improve domain knowledge
When to use RAG vs fine-tuning: RAG for dynamic/updatable knowledge; fine-tuning for style/behaviour
Catastrophic forgetting: fine-tuning can reduce general capabilities; use regularisation
Data quality > quantity: 1000 high-quality examples often beats 100K low-quality ones

Approach	Cost	Updatable	Best For
Prompting	Very low	Yes	General tasks, quick iteration
RAG	Low-medium	Yes	Dynamic knowledge, factual grounding
Fine-tuning (LoRA)	Medium	No	Style, format, domain vocabulary
Full fine-tuning	High	No	Deep domain adaptation
Pre-training from scratch	Very high	No	Truly proprietary architecture

Real-World Example

Meta's LLaMA models are released as open weights specifically so organisations can fine-tune them with LoRA for their domain. Bloomberg trained BloombergGPT from scratch on 700B tokens of financial data — but most companies achieve similar results fine-tuning LLaMA on their data at a fraction of the cost.

←PreviousRetrieval-Augmented Generation NextDiffusion Models→