IntermediateИнженерия

OpenAI Audio & Vision: Whisper and GPT-4V

Official OpenAI examples for audio and vision: transcription with Whisper, prompting for better results, Realtime API, call summarization, and image tagging with GPT-4V.

3modules

5lessons

125 mintotal time

Developers working with audio and images via OpenAIaudience

Module 1

Transcription with Whisper

Audio pre-processing and transcript post-processing: silence trimming, segmentation, punctuation and domain-vocabulary correction.

Whisper: Audio Pre- and Post-Processing

Improve Whisper transcription quality by trimming leading silence, splitting long files into segments, and running GPT post-processing to add punctuation, fix financial terminology, and remove non-ASCII artefacts.

25 min

Module 2

Prompting and Transcription Methods

Fine-tuning Whisper through the prompt parameter, choosing the right transcription method (batch vs streaming vs Realtime), and comparing trade-offs.

Whisper Prompting: Better Transcription Output

Use the optional Whisper API prompt parameter to control output style and the correct spelling of names, brands, and terms. Explore how GPT can generate fictitious prompts for Whisper.

25 min

Report a bug

OpenAI Audio & Vision: Whisper and GPT-4V

Transcription with Whisper

Prompting and Transcription Methods

Realtime API and Vision