Vấn Đề Với Full Fine-tuning
Full fine-tuning một 7B model cần:
- Memory: 28GB+ VRAM (model) + 56GB+ (gradients, optimizer states)
- Storage: Mỗi checkpoint 14GB
- Hardware: 4x A100 80GB minimum
LoRA: Low-Rank Adaptation
Thay vì update toàn bộ weights, LoRA thêm small trainable matrices:
Original: W (d x k)
LoRA: W + BA where B (d x r), A (r x k), r << d, k
Ví dụ: d=4096, k=4096, r=8
- Original params: 16M
- LoRA params: 65K (250x smaller!)
QLoRA: Quantized LoRA
Combines:
- 4-bit quantization: Model weights lưu 4-bit thay vì 16-bit
- LoRA adapters: Trainable trong 16-bit
- Double quantization: Quantize quantization constants
Result: Fine-tune 65B model trên single 48GB GPU!
Implementation với PEFT
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
# Load base model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3-8B",
load_in_4bit=True,
device_map="auto"
)
# Configure LoRA
lora_config = LoraConfig(
r=16, # Rank
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"], # Which layers
lora_dropout=0.05,
bias="none"
)
# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 4,194,304 (0.1% of 8B)
Hyperparameter Tuning
| Parameter | Description | Typical Range |
|---|---|---|
| r (rank) | LoRA rank | 8-64 |
| alpha | Scaling | 2x rank |
| target_modules | Which layers | qkv_proj, mlp |
| dropout | Regularization | 0.05-0.1 |
Pro Tips
- Higher r = more capacity nhưng more params, slower training
- Target all linear layers cho best results
- Gradient checkpointing để reduce memory thêm
💡 Start với r=16, alpha=32. Increase r nếu underfit.
