Post-Training
Post-training refers to the process of refining and optimizing large language models after their initial pre-training phase. This involves techniques like fine-tuning, alignment, and safety enhancements to make models more useful, accurate, and safe for specific applications.
Companies urgently need post-training experts because deploying raw foundation models is insufficient for production use—they require alignment with human values, reduction of harmful outputs, and customization for specific domains. The AI safety race and competitive pressure to release reliable, enterprise-ready models have made this skill critical for reducing hallucinations and ensuring responsible AI deployment.
🎓 Courses
Finetuning Large Language Models
When and how to fine-tune LLMs — data prep, training, evaluation. Free, focused, practical.
Reinforcement Learning from Human Feedback
Hands-on RLHF — reward model training, PPO, and evaluation with Google.
Pretraining LLMs
Understand what happens before post-training — data curation, tokenization, pre-training objectives.
Hugging Face PEFT/LoRA
Official docs for Parameter-Efficient Fine-Tuning — LoRA, QLoRA, prefix tuning. The tools you'll use.
📖 Books
Hands-On Large Language Models
Jay Alammar, Maarten Grootendorst · 2024
Visual approach to fine-tuning, RLHF, DPO, and alignment with clear illustrations and code.
LLM Engineer's Handbook
Paul Iusztin, Maxime Labonne · 2024
End-to-end LLM engineering including LoRA, QLoRA, DPO, and deployment.
Deep Learning
Ian Goodfellow, Yoshua Bengio, Aaron Courville · 2016
Free. The foundational ML textbook — understand optimization and training dynamics before fine-tuning.
🛠️ Tutorials & Guides
Maxime Labonne's LLM Course
28K+ stars. Comprehensive roadmap: fine-tuning, DPO, merging, quantization with notebooks for each.
Hugging Face Alignment Handbook
Official recipes for SFT, DPO, RLHF — the exact scripts used to train Zephyr.
RLHF Illustrated (Chip Huyen)
Clear visual walkthrough of RLHF — reward modeling, PPO, and practical challenges.
Unsloth Documentation
Fine-tune 2x faster with 80% less memory — the fastest way to do LoRA/QLoRA.
Learning resources last updated: March 30, 2026