OpenAI Cookbook: Official Recipes · Lesson 5
GPT Vision for Video Understanding
Extract frames from a video using OpenCV, send them to GPT-4.1-mini as base64 images, get a scene description, and generate a David Attenborough-style voiceover via the TTS API.