Whisper Prompting: Better Transcription Output
Use the optional Whisper API prompt parameter to control output style and the correct spelling of names, brands, and terms. Explore how GPT can generate fictitious prompts for Whisper.
The prompt parameter in Whisper API
Whisper accepts an optional prompt — a short text that sets the transcription style and vocabulary. The model does not follow instructions in the prompt: it imitates its style.
Limits: the prompt is truncated to 224 tokens; instructions like "use Markdown" are ignored.
Basic transcription with a prompt
from openai import OpenAI
import os
client = OpenAI()
def transcribe(audio_filepath, prompt: str = "") -> str:
transcript = client.audio.transcriptions.create(
file=open(audio_filepath, "rb"),
model="whisper-1",
prompt=prompt,
)
return transcript.text
# No prompt — baseline
baseline = transcribe("meeting.wav", prompt="")
# Glossary prompt for correct spelling
result = transcribe("meeting.wav", prompt="Attendees: Aimee, Shawn. Product: QuirkQuid Quill")
Longer prompts are more reliable than short ones
# Short prompt — less reliable
transcribe(audio, prompt="president biden.")
# Long prompt establishes a stable pattern
long_prompt = (
"i want to share some thoughts on this topic. "
"multiple sentences help establish a clear pattern. "
"the more text you provide, the more reliably the model follows."
)
transcribe(audio, prompt=long_prompt)
Generating fictitious prompts with GPT
def fictitious_prompt_from_instruction(instruction: str) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0,
messages=[
{
"role": "system",
"content": (
"You are a transcript generator. Create one long paragraph "
"of fictional conversation. Never diarize speakers."
),
},
{"role": "user", "content": instruction},
],
)
return response.choices[0].message.content
# Example: prompt for ellipsis-style transcription
prompt = fictitious_prompt_from_instruction(
"Instead of periods, end every sentence with ellipses."
)
transcribe(audio, prompt=prompt)
Whisper prompts are most useful for: brand/name glossaries, punctuation normalisation, style consistency (lowercase/uppercase).
Record or download audio that includes several unusual names or product names. Transcribe without a prompt, then with a glossary prompt. Compare spelling accuracy between the two results.
Copy and adapt to your context. Text in angle brackets should be replaced.
from openai import OpenAI
client = OpenAI()
def transcribe(path: str, prompt: str = "") -> str:
with open(path, "rb") as f:
return client.audio.transcriptions.create(
file=f, model="whisper-1", prompt=prompt
).text
def build_glossary_prompt(*terms) -> str:
joined = ", ".join(terms)
return (
f"Glossary of proper nouns and product names: {joined}. "
"Please ensure these terms are spelled correctly in the transcript."
)
# Usage
no_prompt = transcribe("audio.wav")
with_glossary = transcribe(
"audio.wav",
prompt=build_glossary_prompt("QuirkQuid", "GPT-4o", "Aimee", "Shawn"),
)
print("Without prompt:", no_prompt[:200])
print("With glossary: ", with_glossary[:200])Writing instructions in the prompt ('capitalize proper nouns') — Whisper ignores instructions, it only imitates style. Using a prompt longer than 224 tokens — the excess is silently discarded.
Several sentences in the prompt create a much more stable style pattern than a single word. GPT-4o-mini is a great generator of fictitious transcripts for unconventional styles.
Transcribing interviews with unusual names, product demo recordings with brand names, podcasts with repeating domain terms.
When you need to completely change the language or transcription style — the prompt does not override the model's audio comprehension.