IntermediateИнженерия
OpenAI Audio & Vision: Whisper and GPT-4V
Official OpenAI examples for audio and vision: transcription with Whisper, prompting for better results, Realtime API, call summarization, and image tagging with GPT-4V.
3modules
5lessons
125 mintotal time
Developers working with audio and images via OpenAIaudience
Module 1
Transcription with Whisper
Audio pre-processing and transcript post-processing: silence trimming, segmentation, punctuation and domain-vocabulary correction.
Module 2
Prompting and Transcription Methods
Fine-tuning Whisper through the prompt parameter, choosing the right transcription method (batch vs streaming vs Realtime), and comparing trade-offs.
Module 3
Realtime API and Vision
Summarizing long voice conversations via the Realtime API and tagging images with GPT-4V.