Participants: Sam Charrington (Host, TwiML AI Podcast) and Josh Tobin (Technical Staff, OpenAI) – OpenAI AI agents
Introduction
Sam Charrington: Welcome back to the show, Josh. It’s been five years since we last spoke. Since then, you’ve co-founded Gantry and recently rejoined OpenAI to lead the Agents Research team. What have you been up to? – .OpenAI AI agents
Josh Tobin: It’s great to be back. After leaving OpenAI in 2019 to build Gantry, a machine learning infrastructure startup, I saw the industry shift. We used to think every company would train its own models. But foundation models like GPT-4 proved so capable that it’s now more efficient for businesses to build on top of them. I rejoined OpenAI in September to lead the team building agentic products like Operator, Deep Research, and the Codec CLI. – OpenAI AI agents
How Agents Think & Act
Moving beyond human-designed workflows to agents that learn, reason, and self-correct through Reinforcement Learning.
Participants: Sam Charrington (Host, TWiML AI Podcast) and Josh Tobin (Technical Staff, OpenAI)
Introduction
Sam Charrington: Welcome back to the show, Josh. It’s been five years since we last spoke. Since then, you’ve co-founded Gantry and recently rejoined OpenAI to lead the Agents Research team. What have you been up to?
Josh Tobin: It’s great to be back. After leaving OpenAI in 2019 to build Gantry, a machine learning infrastructure startup, I saw the industry shift. We used to think every company would train its own models. But foundation models like GPT-4 proved so capable that it’s now more efficient for businesses to build on top of them. I rejoined OpenAI in September to lead the team building agentic products like Operator, Deep Research, and the Codec CLI.
⚡ The Paradigm Shift
Human-Designed Systems
- Humans manually break down the workflow into fixed steps.
- Rule-based logic struggles with messy, real-world inputs.
- Compounding errors make early mistakes increasingly costly.
Learned RL Behaviors
- End-to-end training teaches agents through success and failure.
- Self-correction helps them reroute when something goes wrong.
- Reasoning models apply different levels of effort per task step.
Deep Research
Goes broad and deep across the web to synthesize complex reports, find rare facts, and explore codebases when needed.
Operator
A computer-use agent that navigates a browser, clicks through web pages, and performs real-world tasks like reservations.
Codec CLI
An open-source local agent that explores your file system, writes patches, and acts like a superhuman intern for developers.
The Era of “Vibe Coding”
Software engineering is shifting. Instead of manually writing every line, people will spend more energy on architecture, trade-offs, and validating agent outputs. The agent writes the code, while the human steers the intent.
The Evolution of Agents: From Rules to Reasoning
Sam Charrington: We’ve seen many “agent” demos, but they often struggle with reliability. How is the current generation of agents different from the workflows people were building in 2023 and 2024?
Josh Tobin: In the past, humans designed systems that broke problems into static steps and assigned them to an LLM. The problem is that the real world is messy. If you have a 10-step process and the model is 90% accurate at each step, your final accuracy is very low. Small errors compound.
The missing ingredient was direct, end-to-end training. We are now training agents using reinforcement learning (RL) to solve these workflows. By doing this, the agent actually experiences failure during training. It learns what it looks like to fail at a web search and learns to reroute itself—to “think,” go back, and try a different search term. – OpenAI AI agents.
Key Agentic Products
Sam Charrington: OpenAI has launched several agentic surfaces recently. Can you walk us through Deep Research, Operator, and the Codec CLI?
Josh Tobin: – Deep Research: This is designed to go broad and deep. It’s not just for market research; users are pushing it to find rare facts buried in fan pages or GitHub repos. It uses RL to navigate trajectories, synthesize information, and even ask follow-up questions to clarify the user’s intent.
- Operator: This is our “computer use” agent. It operates a virtual browser to perform real-world tasks like booking reservations. It’s still in an early stage—a technology preview of sorts—but it demonstrates the intelligence required to navigate complex UI.
- Codec CLI: This is an open-source local coding agent. It’s “contextless,” meaning it doesn’t need a heavy pre-indexed map of your codebase. It uses standard terminal tools like
grepandlsto explore your files like a “superhuman intern.”
Check Out: The Architect of Impact: Dara Khosrowshahi on Rebuilding Uber and the AI Frontier
The Future of Software Engineering: “Vibe Coding”
Sam Charrington: There’s a lot of talk about “Vibe Coding.” How is the relationship between engineers and code changing?
Josh Tobin: We are in the early phases of a dramatic shift. I don’t think writing code goes away, but manually writing every line will become rare. Most code will be written by AI. The engineer’s job will move up the stack to focus on architecture, trade-offs, edge cases, and validating the AI’s work.
Sam Charrington: Does that turn engineers into Product Managers?
Josh Tobin: It accelerates a shift toward “design engineers” and technical PMs. You spend less mental energy on the syntax of a framework and more on why you’re building something and how you know it’s working well. – OpenAI AI agents.
Trust, Security, and Tools
Sam Charrington: As agents start using credit cards or accessing private files, trust becomes the primary bottleneck. How do we solve that?
Josh Tobin: We need better ways to specify levels of trust. For high-risk actions like using a credit card, the system should have strict guidelines—for example, “Always ask for permission before a transaction.” We have to build this trust iteratively. It’s a mix of model alignment (ensuring the model follows guidelines) and product design (ensuring the user has visibility into what the agent is doing).
Sam Charrington: And what about the Model Context Protocol (MCP)?
Josh Tobin: Exposing tools to models is the formula for useful agents. You need a reasoning model, the right tools, and task-specific RL training to teach the model how to use those tools effectively. MCP and similar protocols are critical for that ecosystem. – OpenAI AI agents.
Closing Thoughts
Sam Charrington: It’s an exciting time. Any advice for people trying to learn these new skills?
Josh Tobin: The best way to learn anything now—including programming—is using these tools. Even if the AI writes the code, I still think it’s important to learn the fundamentals of programming. Just like a machine learning researcher should understand back-propagation even if they use libraries, an engineer needs to understand code so they can “spelunk” down the stack when things go wrong.