Interview: How OpenAI Builds AI Agents That Think and Act

Dr. Adrian Cole

March 9, 2026

OpenAI AI agents

Participants: Sam Charrington (Host, TwiML AI Podcast) and Josh Tobin (Technical Staff, OpenAI) – OpenAI AI agents

Introduction

Sam Charrington: Welcome back to the show, Josh. It’s been five years since we last spoke. Since then, you’ve co-founded Gantry and recently rejoined OpenAI to lead the Agents Research team. What have you been up to? – .OpenAI AI agents

Josh Tobin: It’s great to be back. After leaving OpenAI in 2019 to build Gantry, a machine learning infrastructure startup, I saw the industry shift. We used to think every company would train its own models. But foundation models like GPT-4 proved so capable that it’s now more efficient for businesses to build on top of them. I rejoined OpenAI in September to lead the team building agentic products like Operator, Deep Research, and the Codec CLI. – OpenAI AI agents

OpenAI Insights: Josh Tobin

How Agents Think & Act

Moving beyond human-designed workflows to agents that learn, reason, and self-correct through Reinforcement Learning.

Participants: Sam Charrington (Host, TWiML AI Podcast) and Josh Tobin (Technical Staff, OpenAI)

Introduction

Sam Charrington: Welcome back to the show, Josh. It’s been five years since we last spoke. Since then, you’ve co-founded Gantry and recently rejoined OpenAI to lead the Agents Research team. What have you been up to?

Josh Tobin: It’s great to be back. After leaving OpenAI in 2019 to build Gantry, a machine learning infrastructure startup, I saw the industry shift. We used to think every company would train its own models. But foundation models like GPT-4 proved so capable that it’s now more efficient for businesses to build on top of them. I rejoined OpenAI in September to lead the team building agentic products like Operator, Deep Research, and the Codec CLI.

⚡ The Paradigm Shift

The Old Way (2023)

Human-Designed Systems

  • Humans manually break down the workflow into fixed steps.
  • Rule-based logic struggles with messy, real-world inputs.
  • Compounding errors make early mistakes increasingly costly.
The OpenAI Way

Learned RL Behaviors

  • End-to-end training teaches agents through success and failure.
  • Self-correction helps them reroute when something goes wrong.
  • Reasoning models apply different levels of effort per task step.

Operator

A computer-use agent that navigates a browser, clicks through web pages, and performs real-world tasks like reservations.

Codec CLI

An open-source local agent that explores your file system, writes patches, and acts like a superhuman intern for developers.

The Era of “Vibe Coding”

Software engineering is shifting. Instead of manually writing every line, people will spend more energy on architecture, trade-offs, and validating agent outputs. The agent writes the code, while the human steers the intent.

“99% of code will be written by AI systems soon.”

The Evolution of Agents: From Rules to Reasoning

Sam Charrington: We’ve seen many “agent” demos, but they often struggle with reliability. How is the current generation of agents different from the workflows people were building in 2023 and 2024?

Josh Tobin: In the past, humans designed systems that broke problems into static steps and assigned them to an LLM. The problem is that the real world is messy. If you have a 10-step process and the model is 90% accurate at each step, your final accuracy is very low. Small errors compound.

The missing ingredient was direct, end-to-end training. We are now training agents using reinforcement learning (RL) to solve these workflows. By doing this, the agent actually experiences failure during training. It learns what it looks like to fail at a web search and learns to reroute itself—to “think,” go back, and try a different search term. – OpenAI AI agents.

Key Agentic Products

Sam Charrington: OpenAI has launched several agentic surfaces recently. Can you walk us through Deep Research, Operator, and the Codec CLI?

Josh Tobin:Deep Research: This is designed to go broad and deep. It’s not just for market research; users are pushing it to find rare facts buried in fan pages or GitHub repos. It uses RL to navigate trajectories, synthesize information, and even ask follow-up questions to clarify the user’s intent.

  • Operator: This is our “computer use” agent. It operates a virtual browser to perform real-world tasks like booking reservations. It’s still in an early stage—a technology preview of sorts—but it demonstrates the intelligence required to navigate complex UI.
  • Codec CLI: This is an open-source local coding agent. It’s “contextless,” meaning it doesn’t need a heavy pre-indexed map of your codebase. It uses standard terminal tools like grep and ls to explore your files like a “superhuman intern.”

Check Out: The Architect of Impact: Dara Khosrowshahi on Rebuilding Uber and the AI Frontier

The Future of Software Engineering: “Vibe Coding”

Sam Charrington: There’s a lot of talk about “Vibe Coding.” How is the relationship between engineers and code changing?

Josh Tobin: We are in the early phases of a dramatic shift. I don’t think writing code goes away, but manually writing every line will become rare. Most code will be written by AI. The engineer’s job will move up the stack to focus on architecture, trade-offs, edge cases, and validating the AI’s work.

Sam Charrington: Does that turn engineers into Product Managers?

Josh Tobin: It accelerates a shift toward “design engineers” and technical PMs. You spend less mental energy on the syntax of a framework and more on why you’re building something and how you know it’s working well. – OpenAI AI agents.

Trust, Security, and Tools

Sam Charrington: As agents start using credit cards or accessing private files, trust becomes the primary bottleneck. How do we solve that?

Josh Tobin: We need better ways to specify levels of trust. For high-risk actions like using a credit card, the system should have strict guidelines—for example, “Always ask for permission before a transaction.” We have to build this trust iteratively. It’s a mix of model alignment (ensuring the model follows guidelines) and product design (ensuring the user has visibility into what the agent is doing).

Sam Charrington: And what about the Model Context Protocol (MCP)?

Josh Tobin: Exposing tools to models is the formula for useful agents. You need a reasoning model, the right tools, and task-specific RL training to teach the model how to use those tools effectively. MCP and similar protocols are critical for that ecosystem. – OpenAI AI agents.

Closing Thoughts

Sam Charrington: It’s an exciting time. Any advice for people trying to learn these new skills?

Josh Tobin: The best way to learn anything now—including programming—is using these tools. Even if the AI writes the code, I still think it’s important to learn the fundamentals of programming. Just like a machine learning researcher should understand back-propagation even if they use libraries, an engineer needs to understand code so they can “spelunk” down the stack when things go wrong.

Leave a Comment