OpenAI Agents & Orchestration · Lesson 2

Coding Agent: GPT-4.1 + Agents SDK

Building a coding agent with GPT-4.1 via the OpenAI Agents SDK: web_search + shell tools, ShellExecutor with an isolated workspace, scaffold → feedback → iteration loop.

30 min read3 questions in quizReady prompt includedIn progress

GPT-4.1 for Coding

GPT-4.1 is OpenAI's best model today for writing and editing code. With the apply_patch and shell tools from the Responses API it can work across entire codebases.

from agents import Agent, Runner, WebSearchTool, ShellTool
import os

assert "OPENAI_API_KEY" in os.environ, "Set OPENAI_API_KEY first"

Agent Setup

from pathlib import Path

workspace_dir = Path("coding-agent-workspace").resolve()
workspace_dir.mkdir(exist_ok=True)

coding_agent = Agent(
    name="CodingAgent",
    model="gpt-4.1",
    instructions=(
        "You are an expert coding agent. Scaffold apps from user prompts, "
        "edit files with apply_patch, and run shell commands in the workspace. "
        "Always sandbox commands — never run outside workspace_dir."
    ),
    tools=[
        WebSearchTool(),
        ShellTool(cwd=str(workspace_dir), require_approval=True),
    ],
)

ShellExecutor with Isolation

import asyncio
from agents import ShellCommandRequest, ShellResult

class ShellExecutor:
    def __init__(self, workspace: Path, require_approval: bool = True):
        self.workspace = workspace
        self.require_approval = require_approval

    async def __call__(self, req: ShellCommandRequest) -> ShellResult:
        if self.require_approval:
            print(f"Agent wants to run: {req.command}")
            if input("Approve? (y/n): ").strip().lower() != "y":
                return ShellResult(stdout="", stderr="Rejected by user", exit_code=1)
        proc = await asyncio.create_subprocess_shell(
            req.command,
            cwd=str(self.workspace),
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )
        stdout, stderr = await proc.communicate()
        return ShellResult(stdout=stdout.decode(), stderr=stderr.decode(),
                           exit_code=proc.returncode)

Agent Loop

The Agents SDK automatically manages the loop: model call → tool execution → next call — until a final response is received.

Practical exercise

What to do after this lesson

Run the coding agent with an instruction to create a simple Flask app with a single /health endpoint. Make sure the agent uses ShellExecutor in an isolated directory and you can approve/reject each command.

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

import asyncio
from pathlib import Path
from agents import Agent, Runner, WebSearchTool

workspace_dir = Path("agent-workspace").resolve()
workspace_dir.mkdir(exist_ok=True)

agent = Agent(
    name="DevAgent",
    model="gpt-4.1",
    instructions="Scaffold a minimal Flask app with /health endpoint in the workspace.",
    tools=[WebSearchTool()],
)

async def main():
    runner = Runner(agent=agent)
    result = await runner.run(
        "Create a Flask app with a /health endpoint returning JSON {status: ok}"
    )
    print(result.final_output)

asyncio.run(main())

Common mistakes

What people get wrong

Running shell commands without an isolated workspace — the agent can modify files outside the project. Not implementing require_approval in production — a critical security risk.

Pro tips

What works but no one documents

The Agents SDK manages the agentic loop automatically (tool calls → responses → repeat). require_approval=True in ShellExecutor lets you control each command during development.

When to use

Code generation automation, scaffolding new projects, iterative code refinement through user feedback.

When not to use

Tasks without filesystem access. Production without full isolation (Docker/VM sandbox) — unrestricted shell is dangerous.

Official sources

OpenAI Cookbook

Квиз — 3 вопроса

1.GPT-4.1 received a shell tool without workspace isolation. What is the risk in production?

2.The Agents SDK manages the agentic loop automatically. What happens after the model calls a tool?

3.Why is ShellExecutor implemented with asyncio.create_subprocess_shell instead of subprocess.run?

Отвечено: 0 из 3

Войдите, чтобы сохранять прогресс и отмечать пройденные уроки.

Войти

← Agent Orchestration: Routines and Handoffs Function Calls with Reasoning Models (o4-mini) →