AdvancedИнженерия
OpenAI Fine-tuning: From Data to DPO
Official OpenAI fine-tuning examples: data preparation, chat model fine-tuning, Direct Preference Optimization, model distillation, and Reinforcement Fine-Tuning. Real code from the OpenAI team.
3modules
5lessons
175 mintotal time
ML engineers and developers customizing OpenAI modelsaudience
Module 1
Data Preparation and Basic Fine-Tuning
Loading, validating, and counting tokens for a chat model fine-tuning dataset; running a complete fine-tuning cycle on an ingredient extraction example.
Module 2
DPO and Preferences
Direct Preference Optimization technique: when and why SFT is not enough, the preferred/rejected dataset format, and a full DPO job cycle via the API.
Module 3
Distillation and Reinforcement Fine-Tuning
Distilling a large model's knowledge into a smaller one via the Store API and Structured Outputs; Reinforcement Fine-Tuning with verifiable graders on medical data.