In a landmark discussion, Lex Fridman sits down with two of the most influential voices in modern machine learning: Sebastian Raschka, LLM Researcher and author, and Nathan Lambert, Post-training Lead at AI2. Together, they dissect the rapid evolution of artificial intelligence from the “DeepSeek Moment” of 2025 to the agentic reality of 2026. – AI State of the Union 2026.

The “DeepSeek Moment” and the Global Shift

Lex: We often look at AI through the lens of specific breakthroughs. Looking back from 2026, what was the “DeepSeek Moment”?

Nathan: It happened in January 2025. The Chinese company DeepSeek released R1, which surprised everyone by matching state-of-the-art performance with significantly less compute and cost. It shifted the narrative from “who has the most GPUs” to “who has the most efficient training algorithms.”

Sebastian: I agree. It won the hearts of the open-source community. Today, in 2026, we see that ideas aren’t proprietary anymore—researchers move between labs constantly. The real differentiator now is the sheer budget for hardware and the culture of the organizations building them. – AI State of the Union 2026

The State of AI in 2026

Lex Fridman Podcast #490

THE STATE OF AI IN 2026

Deep Dive: LLMs, Scaling Laws, RLVR, and the global race for AGI with Sebastian Raschka & Nathan Lambert.

The Global Landscape

The “DeepSeek Moment”

January 2025 marked a shift: Chinese models proving SOTA performance with significantly less compute.

US Frontier Labs Closed-Weight Focus

China Open-Weights DeepSeek, Z.AI, Quen

Winning is temporary. Access to ideas is fluid; the differentiator is now Budget and Hardware.

Model Superstars

Claude Opus 4.5

Dominating coding and voice with extended reasoning features.
Gemini 3

Leveraging massive structural advantages and huge context windows.
GPT-5 / 5.2

Pushing thinking modes and complex agentic routing.

Education Roadmap

The best way to understand is to build it yourself.

Start with GPT-2 from scratch

Implement Attention mechanisms

Move to Fine-tuning & RLHF

Experiment with Lora Adapters

The Technical Breakthroughs

Architecture Tweaks

MoE (Mixture of Experts)

Expands knowledge without proportional compute per token.

MLA (Multi-head Latent Attention)

Optimizes KV cache size for larger context windows.

The New Training Paradigm

RLVR

Reinforcement Learning with Verifiable Rewards using executable answers.

Inference-Time Scaling

Using more compute during runtime reasoning to solve harder problems.

The Programming Evolution

Traditional (Cursor Style)

High control pair programming where humans remain the primary architects.

Agentic (Claude Code Style)

Programming with English where the AI manages files and commands.

Senior developers are shipping 50%+ AI generated code

2026 Trend

The Architecture of 2026: Beyond the Transformer?

While the industry remains rooted in the Transformer, 2026 is defined by specialized “tweaks” that have unlocked massive context windows and unprecedented reasoning capabilities.

“We’ve moved beyond simple RLHF. The big breakthrough is RLVR—Reinforcement Learning with Verifiable Rewards.”

— Nathan Lambert

Lex: Sebastian, you’ve written about building LLMs from scratch. How much has the architecture actually changed since GPT-2?

Sebastian: Fundamentally, not as much as you’d think. It’s still the Transformer. But we’ve added “tweaks” that have massive scaling impacts:

MoE (Mixture of Experts): This allows us to make models larger without increasing the cost of every single forward pass.
MLA (Multi-head Latent Attention): This was huge for 2025 and 2026. It optimizes the KV Cache, making it cheaper to handle massive context windows.

Programming with English

The transition from writing syntax to “Programming with English” has fundamentally altered the day-to-day life of developers. The focus has shifted from the how to the what.

Lex: Nathan, you seem more bullish on the agentic side.

Nathan: Absolutely. Tools like Claude Code represent “Programming with English.” You don’t micromanage the lines; you guide the design at a macro level. The AI manages the repo, runs the CLI, and handles Git. It’s a different skill set—research taste and system design are now more important than syntax.

Sebastian: I’m still a bit of a control freak; I like to see the diffs. But I use it to automate the mundane tasks—fixing broken links, boilerplate, and refactoring. We have to find a “Goldilocks zone” where we use AI to be productive but still invest in our own mental frameworks. – AI State of the Union 2026.

The Shift in Technical Paradigms

Feature	Pre-2025 Era	The AI State of the Union 2026
Primary Goal	Parameter Count / GPU Hoarding	Algorithmic Efficiency / Compute per Token
Training	Human-led RLHF	RLVR (Verifiable Rewards)
Coding	Manual Syntax & Copilots	Agentic “English” Programming
Open Source	Following the Giants	Frontier-level Open Weights (DeepSeek, Qwen)

AGI and the Legacy of Compute

Lex: Let’s talk about the “Singularity” or AGI. Where are we?

Nathan: There’s a document—AI 2027—that predicted a “Superhuman Coder” by next year. I think it’s a bit aggressive because AI is “jagged.” But 2031 seems like a reasonable mean prediction for a fully autonomous AI researcher.

Lex: 100 years from now, what will historians say?

Sebastian: They won’t remember the name “Transformer” or specific GPU models. They will look at this as the era where Compute became the primary engine of civilization, much like the steam engine was for the Industrial Revolution.

Nathan: I hope they see it as the time we democratized knowledge. Making the sum of human wisdom accessible to everyone, everywhere, for the cost of a few tokens.

Check Out: The Agentic Revolution: Inside the Mind of OpenClaw’s Peter Steinberger

The 2026 AI Report: Efficiency, Open-Source Wars, and the Era of “Programming in English”