Agentic AI is about more than CPUs
Heterogeneous datacenters are the future.
When Nvidia announced the Vera CPU as part of their Vera Rubin system, they described it as “purpose built for Agentic AI”. ARM just released their AGI CPU, which doesn’t actually stand for “artificial general intelligence” as you might expect, but refers to something about agents. Intel and SambaNova are partnering on agentic systems using Intel Xeon 6 CPUs. Overall, there’s been renewed interest in the CPU’s role in the datacenter as the industry becomes more focused on reinforcement learning and AI agents.
Before modern RL and agentic loops, AI datacenters were mostly just GPUs working to generate responses to human-written prompts. However, when performing RL training, a model needs to interact with a simulated environment, which is usually run on a CPU. And if an agent is performing a task autonomously, it needs to interact with various tools and external programs, which also require CPU resources to run. This new dynamic is causing ballooning CPU demand, with multiple AWS customers begging to purchase all of AWS’ Gravitron CPU instance capacity in 2026.
But focusing purely on the role of CPUs in modern RL training and agentic inference datacenters is an oversimplification. If more and more CPU utilization is being spent on RL environments and agentic tool calling, this provides a huge opportunity for chips that offload key CPU functions to dedicated silicon. That way, the CPU can stay focused on running RL environments and agentic loops, while other accelerators like DPUs and SmartNICs manage everything from cryptography to storage to networking.
The Rise of Agentic CPUs
RL training and agentic inference involve a lot more than just running an AI model on a GPU and returning it’s output. Instead, an AI agent runs in a loop, interacting with its environment by calling other programs, running test code, and looking things up. RL training loops look similar, with an agent interacting with a simulated environment that measures how well it’s performing at a task, from playing a video game, to writing code that passes a series of tests, to successfully using a web browser to order a pizza. These environments are complex, branching, serial programs: the exact kind of programs that run better on a CPU than a GPU.
This poses a problem: many AI datacenters currently have a single CPU acting as the host processor for many GPUs. If there are far more GPUs than CPUs in a system, agentic inference and RL training will start to be bottlenecked by CPU performance, rather than GPU performance. Some analysts are predicting that datacenter CPU:GPU ratios may approach 1:1, rather than the 1:8 ratio we saw in previous eras of AI datacenter build-out.
But the CPU needs to do so much more than just run the agentic environment. In many datacenter build-outs, CPUs are already responsible for a significant amount of networking, storage, and security operations. If we also want CPUs to also run agentic loops and RL environments, datacenter CPUs are going to be overworked and inefficient.
SmartNICs and DPUs to the rescue
If you want your CPU to stay fully utilized running agentic loops and RL environments, any time spent coordinating networking, storage, or cryptography is overhead. A datacenter CPU might spend 30% or more of its cycles managing these subroutines, which essentially makes your CPU 30% less efficient. Specific math-heavy operations, like running cryptographic algorithms for packet authentication and decryption, are particularly expensive for CPUs -- and as microservices and zero-trust architectures proliferate in the datacenter, the problem only gets worse. Plus, new post-quantum cryptography algorithms need larger keys and more complex computations for encryption and decryption, reducing CPU efficiency even further.
Instead of spending an extra 30% on additional CPUs to make up for this efficiency loss, datacenters can rely on specialized hardware to accelerate these subroutines. A SmartNIC, or a smart networking interface card, is a PCIe card with dedicated hardware for accelerating networking, cryptography, and sometimes storage. Dedicated hardware for features like packet processing, block ciphers, and asymmetric encryption can be much more efficient than software implementations. That way, relatively lower-cost and lower-power FPGAs or dedicated SmartNIC chips can replace additional high-cost, high-power CPUs as a way to make up for the efficiency loss to networking and storage subroutines. A DPU is similar, but even more autonomous, integrating fully functional low-power CPU cores to handle networking, storage, and security tasks autonomously and offload even more data from the CPU.
The Datacenter of the Future is Heterogenous
For a very long time, the CPU was responsible for nearly all computing in the datacenter. In the first part of the AI era, the GPU became responsible for AI compute, while the CPU was relegated to support functions like networking, storage, and security. But in the era of agents and RL, CPUs are now responsible for running agentic loops and RL environments -- so they can’t afford to spend as much time on those support functions.
In the future, I expect to see significant growth in the SmartNIC and DPU markets to enable datacenter networking in a more cost-efficient and power-efficient way. This value is also going to trickle down to cryptography and networking IP providers, as these solutions need power- and area-efficient cores for packet processing and security.
More generally, as datacenters get larger and more power hungry, I expect efficiency through dedicated heterogeneous hardware to become a key driver to deliver more performance-per-watt. SmartNICs and DPUs allow datacenters to improve their networking efficiency as compared to CPUs -- but there could be other key subroutines inside the datacenter that would benefit from dedicated hardware. This could bode well for FPGA manufacturers like Xilinx and its parent company, AMD: not only do FPGAs power many existing SmartNICs, but they would likely also power new accelerator cards designed to offload additional functionality to free up CPU cycles for running agents and RL loops.

