OpenAI Audio & Vision: Whisper and GPT-4V · Lesson 2

Whisper Prompting: Better Transcription Output

Use the optional Whisper API prompt parameter to control output style and the correct spelling of names, brands, and terms. Explore how GPT can generate fictitious prompts for Whisper.

25 min read3 questions in quizReady prompt includedIn progress

Practical exercise

What to do after this lesson

Record or download audio that includes several unusual names or product names. Transcribe without a prompt, then with a glossary prompt. Compare spelling accuracy between the two results.

Task grader

Record or download audio that includes several unusual names or product names. Transcribe without a prompt, then with a glossary prompt. Compare spelling accuracy between the two results.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

from openai import OpenAI
client = OpenAI()

def transcribe(path: str, prompt: str = "") -> str:
    with open(path, "rb") as f:
        return client.audio.transcriptions.create(
            file=f, model="whisper-1", prompt=prompt
        ).text

def build_glossary_prompt(*terms) -> str:
    joined = ", ".join(terms)
    return (
        f"Glossary of proper nouns and product names: {joined}. "
        "Please ensure these terms are spelled correctly in the transcript."
    )

# Usage
no_prompt = transcribe("audio.wav")
with_glossary = transcribe(
    "audio.wav",
    prompt=build_glossary_prompt("QuirkQuid", "GPT-4o", "Aimee", "Shawn"),
)
print("Without prompt:", no_prompt[:200])
print("With glossary: ", with_glossary[:200])

Basic transcription with a prompt

from openai import OpenAI import os client = OpenAI() def transcribe(audio_filepath, prompt: str = "") -> str: transcript = client.audio.transcriptions.create( file=open(audio_filepath, "rb"), model="whisper-1", prompt=prompt, ) return transcript.text # No prompt — baseline baseline = transcribe("meeting.wav", prompt="") # Glossary prompt for correct spelling result = transcribe("meeting.wav", prompt="Attendees: Aimee, Shawn. Product: QuirkQuid Quill")

Longer prompts are more reliable than short ones

# Short prompt — less reliable transcribe(audio, prompt="president biden.") # Long prompt establishes a stable pattern long_prompt = ( "i want to share some thoughts on this topic. " "multiple sentences help establish a clear pattern. " "the more text you provide, the more reliably the model follows." ) transcribe(audio, prompt=long_prompt)

Generating fictitious prompts with GPT

def fictitious_prompt_from_instruction(instruction: str) -> str: response = client.chat.completions.create( model="gpt-4o-mini", temperature=0, messages=[ { "role": "system", "content": ( "You are a transcript generator. Create one long paragraph " "of fictional conversation. Never diarize speakers." ), }, {"role": "user", "content": instruction}, ], ) return response.choices[0].message.content # Example: prompt for ellipsis-style transcription prompt = fictitious_prompt_from_instruction( "Instead of periods, end every sentence with ellipses." ) transcribe(audio, prompt=prompt)

Whisper prompts are most useful for: brand/name glossaries, punctuation normalisation, style consistency (lowercase/uppercase).

Report a bug

Whisper Prompting: Better Transcription Output

Task grader

Prompt sandbox

Quiz — 3 questions

Discussion

The prompt parameter in Whisper API

Basic transcription with a prompt

Longer prompts are more reliable than short ones

Generating fictitious prompts with GPT