OpenAI Fine-tuning: From Data to DPO · Lesson 2

Fine-Tuning Chat Models: End-to-End

End-to-end walkthrough: preparing JSONL, uploading files via the Files API, launching a fine-tuning job, monitoring progress, and running inference on the fine-tuned model.

35 min read3 questions in quizReady prompt includedIn progress

Practical exercise

What to do after this lesson

Create a dataset of 50+ examples for a named entity extraction task in your domain, launch a fine-tuning job on gpt-4o-mini, and compare results against the base model on 20 test examples.

Task grader

Create a dataset of 50+ examples for a named entity extraction task in your domain, launch a fine-tuning job on gpt-4o-mini, and compare results against the base model on 20 test examples.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

You are launching a fine-tuning job via the OpenAI API.
Prepare train.jsonl and val.jsonl (minimum 50 examples in train),
upload them via the Files API,
create a job with model gpt-4o-mini-2024-07-18,
wait for the succeeded status,
and run inference on 5 test examples to evaluate quality.

Prompt sandbox

Prompt

Common mistakes

What people get wrong

Dataset too small (fewer than 10 examples); running inference before the job reaches succeeded status; using outdated model IDs.

Preparing Training Data

import json, pandas as pd from openai import OpenAI client = OpenAI() recipe_df = pd.read_csv("data/cookbook_recipes_nlg_10k.csv") system_msg = "You are a helpful recipe assistant. Extract generic ingredients." def prepare_example(row): return { "messages": [ {"role": "system", "content": system_msg}, {"role": "user", "content": f"Title: {row['title']}\n\nIngredients: {row['ingredients']}\n\nGeneric ingredients: "}, {"role": "assistant", "content": row["NER"]}, ] } training_data = recipe_df.loc[0:100].apply(prepare_example, axis=1).tolist() validation_data = recipe_df.loc[101:200].apply(prepare_example, axis=1).tolist() def write_jsonl(data, path): with open(path, "w") as f: for d in data: f.write(json.dumps(d) + "\n") write_jsonl(training_data, "train.jsonl") write_jsonl(validation_data, "val.jsonl")

Uploading Files and Launching the Job

def upload_file(path): with open(path, "rb") as f: return client.files.create(file=f, purpose="fine-tune").id train_id = upload_file("train.jsonl") val_id = upload_file("val.jsonl") job = client.fine_tuning.jobs.create( training_file=train_id, validation_file=val_id, model="gpt-4o-mini-2024-07-18", suffix="recipe-ner", ) print("Job ID:", job.id, "| Status:", job.status)

Monitoring and Inference

import time while True: status = client.fine_tuning.jobs.retrieve(job.id).status print(f"Status: {status}") if status in ("succeeded", "failed", "cancelled"): break time.sleep(60) ft_model = client.fine_tuning.jobs.retrieve(job.id).fine_tuned_model # Inference resp = client.chat.completions.create( model=ft_model, messages=[ {"role": "system", "content": system_msg}, {"role": "user", "content": "Title: Pasta\nIngredients: [...]\nGeneric ingredients: "}, ], temperature=0, max_tokens=200, ) print(resp.choices[0].message.content)

Start with 50–100 examples. Performance scales roughly linearly with dataset size, so iteratively increase the size until you reach your target accuracy.

Report a bug