End-to-End Training Pipeline
Step 1: Environment Setup
pip install transformers peft trl bitsandbytes accelerate datasets
Step 2: Load Model và Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3-8B-Instruct",
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3-8B-Instruct")
tokenizer.pad_token = tokenizer.eos_token
Step 3: Configure LoRA
from peft import LoraConfig, prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
Step 4: Training với SFTTrainer
from trl import SFTTrainer, SFTConfig
training_args = SFTConfig(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
fp16=True,
logging_steps=10,
save_strategy="epoch",
max_seq_length=2048
)
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
peft_config=lora_config,
tokenizer=tokenizer
)
trainer.train()
Monitoring Training
Loss Curve
- Smoothly decreasing = good
- Jumping = learning rate too high
- Flat = underfitting or bad data
W&B Integration
import wandb
wandb.init(project="llm-finetuning")
# SFTTrainer auto-logs to W&B
Pro Tips
- Gradient checkpointing: Save memory, allow larger batch sizes
- Mixed precision (bf16): Faster training, less memory
- Learning rate: 1e-4 to 5e-4 for LoRA
💡 Start with 1-3 epochs. More often leads to overfitting.
