If the cloud providers running the AI agent industry’s workloads had been using Murakkab all year, they would have paid roughly a quarter as much electricity for the same output. That number is not a projection. It is the benchmark result from the first published paper on the system.
Researchers from MIT CSAIL and Microsoft Azure Research have developed Murakkab, a resource-efficient serving system for agentic AI workloads that reduces GPU usage by up to 2.8 times, energy consumption by 3.7 times, and cost by 4.3 times compared to how multi-step AI agents are currently deployed on cloud platforms, all while keeping the system within the accuracy and latency targets users specify. The research was published June 25 by MIT and represents one of the first systems specifically designed to address the infrastructure inefficiency that has quietly grown alongside the agentic AI boom.
The inefficiency Murakkab targets is not a problem most AI users see — it lives entirely inside cloud data center infrastructure. But for the cloud providers and enterprises now running thousands of concurrent agentic workflows, it adds up to a meaningful fraction of what frontier AI actually costs to operate.
Key Developments
- MIT CSAIL and Microsoft Azure Research published Murakkab on June 25, 2026 — a system that optimizes how multi-step AI agent workflows are served on cloud hardware.
- Tested on video Q&A and code generation workflows, Murakkab used only 35% of the computation, 27% of the energy, and under 25% of the cost of standard deployment methods while meeting user requirements.
- In one test case, Murakkab reduced energy consumption by more than an order of magnitude with only a 2% drop in accuracy.
- The system introduces a declarative abstraction that decouples workflow design from execution, enabling cross-layer optimization that current cloud schedulers cannot achieve.
What Happened
According to the MIT News release, Murakkab was developed by Gohar Irfan Chaudhry from MIT CSAIL alongside Esha Choukse, Haoran Qiu, Ínigo Goiri, and Rodrigo Fonseca from Microsoft Azure Research, and Ricardo Bianchini from Microsoft Azure. The system works by solving what the researchers describe as a fundamental opacity problem in how agentic workflows are currently deployed: the workflow components — different AI models handling different steps, tool calls, retrieval operations, and code execution — are fragmented across different layers, and the cloud infrastructure running them cannot see inside the workflow well enough to optimize across all of it at once. Each component is scheduled and allocated hardware independently, which means the cloud is continuously leaving efficiency on the table.
Murakkab introduces a declarative abstraction that separates workflow specification from execution configuration: the developer describes what the workflow needs to accomplish and what trade-offs they are willing to accept (accuracy vs. speed, speed vs. cost), and Murakkab handles the mapping from that specification to the actual hardware allocation, model selection, and scheduling. An adaptive runtime monitors the workflow during execution and reconfigures it dynamically as conditions change. According to the published arXiv paper, the system was tested on diverse agentic workflows including video question-and-answer tasks and code generation pipelines, and consistently met service-level objectives while using only a fraction of the resources consumed by baseline methods.
The Mechanism: Why Agentic Workflows Are Uniquely Wasteful
The inefficiency Murakkab addresses is a structural consequence of how agentic systems are built. A typical agentic workflow chains together multiple models — a planner, a code-writer, a retriever, a tool-caller, an evaluator — each with its own latency profile and resource footprint, connected by conditional logic that determines which step runs next based on intermediate outputs. When a cloud provider receives that workflow as a service request, it sees a sequence of opaque black-box calls rather than a unified system, which means it cannot make intelligent decisions about whether to run step three on a smaller, cheaper model, whether to batch steps four and six onto shared GPU memory, or whether the accuracy requirements for step two could be relaxed slightly to free up compute for the latency-critical step five.
Traditional cloud schedulers treat each model call in an agentic workflow as a separate, independent request. That works fine when requests are truly independent. It fails systematically when those requests are steps in a sequential workflow where the output quality of step two constrains the input requirements of step three: the scheduler has no way to reason about that constraint when it can only see individual calls. Murakkab’s declarative abstraction exposes the workflow structure to the serving layer for the first time, enabling cross-step optimization that is simply not possible in today’s architecture. The result is that the system can find configurations — such as the video-frame-selector model configuration the researchers describe discovering unexpectedly — that a developer manually tuning individual components would be unlikely to find on their own, because the optimal configuration only becomes visible when you can see the entire workflow’s resource profile simultaneously.
The Backstory
Murakkab arrives at a moment when the efficiency gap between agentic AI workloads and the infrastructure meant to serve them has become a visible cost problem rather than a theoretical one. Microsoft’s Maia 200 chip was designed from the ground up to cut inference cost at the silicon level — Murakkab addresses the same cost problem at the systems software level, two complementary angles on the same pressure. The researcher leading the MIT side, Gohar Chaudhry, framed the stakes directly: “Agentic workflows are getting very complicated and quickly becoming the backbone of what cloud providers are doing,” he said. “There is a lot of potential to make these workflows more resource-optimal so they consume far less energy, but we need to be thinking about this at the scale of major cloud platforms.”
That scale framing matters. Murakkab’s benchmarks reflect the efficiency gains on individual workflows. Multiplied across the thousands of concurrent agentic sessions a major cloud platform runs simultaneously — every ChatGPT Codex task, every enterprise AI agent processing a document pipeline, every agentic customer-service workflow — the aggregate energy and cost reduction would be substantially larger than what any single benchmark suggests. The paper’s finding that Murakkab lowered energy consumption by more than an order of magnitude in one test case with only a 2 percent accuracy drop also has direct implications for AI’s environmental footprint: the computational growth driving the UN’s warnings about AI data center energy consumption is, in large part, driven by exactly the kind of unoptimized agentic workload Murakkab is targeting.
Reactions
The paper describes the system’s performance improvements across both workflow categories tested — video Q&A and code generation — with consistent results across diverse configurations, suggesting the optimization gains are not specific to a narrow task type. The research team says the next phase of work will expand Murakkab to more complex workflows and larger computing clusters, with particular attention to AI applications not yet represented in the initial benchmarks. No commercial deployment timeline or Azure integration roadmap was announced alongside the research publication.
The Dispute: Research vs. Production
Murakkab’s benchmark results are compelling, but they come from a controlled research environment rather than a live production deployment. The workflows tested — video Q&A and code generation — are meaningful but relatively well-structured; real enterprise agentic workflows involve considerably more heterogeneous components, external API calls with unpredictable latencies, and failure and retry logic that adds further scheduling complexity. Whether the 2.8x GPU reduction and 4.3x cost reduction translate to production-grade agentic systems at cloud scale is the question the paper cannot yet answer, and the honest reading of the benchmarks is as a demonstration of potential rather than a confirmed production result.
There is also an adoption question. Cloud providers like Azure benefit from higher compute utilization — a more resource-efficient customer workload is, in one sense, a less profitable customer workload if the freed compute is not immediately re-allocated to new demand. The commercial incentive to deploy a system like Murakkab at scale is real but not unconditional: it aligns most directly with attracting price-sensitive enterprise customers who would otherwise run fewer agentic workloads, or with reducing the infrastructure cost of serving the agentic products whose compute costs have been a visible concern for CFOs across the industry.
What Happens Next
The researchers have indicated they plan to extend Murakkab to more complex workflow topologies and larger-scale cluster deployments. The immediate question for the field is whether a system of this kind gets incorporated into production cloud infrastructure — Azure being the most obvious candidate given Microsoft’s direct involvement in the research — or remains a research prototype. Given the scale of Microsoft’s commercial investment in agentic AI across Copilot and Azure AI Services, the research team has unusually direct access to the production environment where Murakkab’s gains would matter most. Watch for whether this work surfaces as a feature in Azure’s agentic serving infrastructure rather than as a standalone research release.
Why It Matters
Murakkab addresses a problem that has been growing quietly in the background of the agentic AI boom: the serving layer powering multi-step AI agents was designed for simpler workloads and has not been fundamentally rethought for the workflow architecture that now dominates frontier AI applications. The efficiency gains demonstrated in the paper are large enough that, if they translate to production scale, they represent a meaningful reduction in both the cost and the environmental footprint of the agentic AI industry — not through better chips or more renewable energy, but through more intelligent software that wastes less of what the existing infrastructure already provides. Google’s TurboQuant memory compression work addresses the same broad challenge from a different angle — reducing the memory footprint of model serving rather than the scheduling overhead of workflow orchestration — and together these research directions suggest that the next major cost reduction in AI infrastructure will come from systems-level software optimization rather than purely from hardware advances.
Sources
MIT News; arXiv (Murakkab paper); Mirage News; AI Commission.