Coding Agent: GPT-4.1 + Agents SDK
Building a coding agent with GPT-4.1 via the OpenAI Agents SDK: web_search + shell tools, ShellExecutor with an isolated workspace, scaffold → feedback → iteration loop.
GPT-4.1 for Coding
GPT-4.1 is OpenAI's best model today for writing and editing code. With the apply_patch and shell tools from the Responses API it can work across entire codebases.
from agents import Agent, Runner, WebSearchTool, ShellTool
import os
assert "OPENAI_API_KEY" in os.environ, "Set OPENAI_API_KEY first"
Agent Setup
from pathlib import Path
workspace_dir = Path("coding-agent-workspace").resolve()
workspace_dir.mkdir(exist_ok=True)
coding_agent = Agent(
name="CodingAgent",
model="gpt-4.1",
instructions=(
"You are an expert coding agent. Scaffold apps from user prompts, "
"edit files with apply_patch, and run shell commands in the workspace. "
"Always sandbox commands — never run outside workspace_dir."
),
tools=[
WebSearchTool(),
ShellTool(cwd=str(workspace_dir), require_approval=True),
],
)
ShellExecutor with Isolation
import asyncio
from agents import ShellCommandRequest, ShellResult
class ShellExecutor:
def __init__(self, workspace: Path, require_approval: bool = True):
self.workspace = workspace
self.require_approval = require_approval
async def __call__(self, req: ShellCommandRequest) -> ShellResult:
if self.require_approval:
print(f"Agent wants to run: {req.command}")
if input("Approve? (y/n): ").strip().lower() != "y":
return ShellResult(stdout="", stderr="Rejected by user", exit_code=1)
proc = await asyncio.create_subprocess_shell(
req.command,
cwd=str(self.workspace),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
return ShellResult(stdout=stdout.decode(), stderr=stderr.decode(),
exit_code=proc.returncode)
Agent Loop
The Agents SDK automatically manages the loop: model call → tool execution → next call — until a final response is received.
Run the coding agent with an instruction to create a simple Flask app with a single /health endpoint. Make sure the agent uses ShellExecutor in an isolated directory and you can approve/reject each command.
Copy and adapt to your context. Text in angle brackets should be replaced.
import asyncio
from pathlib import Path
from agents import Agent, Runner, WebSearchTool
workspace_dir = Path("agent-workspace").resolve()
workspace_dir.mkdir(exist_ok=True)
agent = Agent(
name="DevAgent",
model="gpt-4.1",
instructions="Scaffold a minimal Flask app with /health endpoint in the workspace.",
tools=[WebSearchTool()],
)
async def main():
runner = Runner(agent=agent)
result = await runner.run(
"Create a Flask app with a /health endpoint returning JSON {status: ok}"
)
print(result.final_output)
asyncio.run(main())Running shell commands without an isolated workspace — the agent can modify files outside the project. Not implementing require_approval in production — a critical security risk.
The Agents SDK manages the agentic loop automatically (tool calls → responses → repeat). require_approval=True in ShellExecutor lets you control each command during development.
Code generation automation, scaffolding new projects, iterative code refinement through user feedback.
Tasks without filesystem access. Production without full isolation (Docker/VM sandbox) — unrestricted shell is dangerous.