Vista, Cambium unveil $3.5 Billion AI inference cloud built on three different chips
Vista Equity Partners and Cambium Capital have launched Vector Core Compute, an enterprise inference cloud that runs on three different processors simultaneously rather than relying on a single chip type. In terms of physical presence, the infrastructure went live from a Los Angeles facility, with additional sites in development across Chicago, Seattle, and Phoenix, and a longer-term rollout planned across more than 50 US metropolitan areas.
The architecture behind the launch is the part worth understanding first. Most AI infrastructure today concentrates compute in large remote data centers using one dominant chip type. Vector Core Compute, however, takes a different approach entirely, assigning each stage of an AI inference workflow to the processor best suited for it. Intel Xeon CPUs handle orchestration and routing. Meanwhile, SambaNova RDUs manage token generation during the decode phase. NVIDIA Blackwell GPUs, in turn, process the initial high-compute burst when requests first arrive. Each processor handles what it does best, and the system coordinates across all three in real time.
The practical motivation, moreover, comes down to cost. Agentic AI workloads demand different computational characteristics at different stages. As a result, running everything on a single GPU type means paying GPU prices for work that does not actually require GPU capability. Disaggregating the workload across specialized hardware reduces that inefficiency considerably.
Together AI, which processes over 400 trillion tokens monthly for agentic use cases, is the first commercial customer on the platform. Additionally, Vista’s own portfolio of more than 90 software companies, serving roughly 2.5 million enterprise customers worldwide, receives early access as part of the arrangement.
Robert F. Smith, founder and CEO of Vista Equity Partners, put the problem squarely on infrastructure rather than models. The models are ready, he argued. What companies running agentic systems at scale actually struggle with is finding compute that does not make the economics fall apart. That framing shifts the conversation away from AI capability and toward something more operational: who can afford to run it consistently, and at what cost.
The geographic distribution piece ties directly into that concern. Vector Core Compute spreads capacity across more than 50 US metro markets instead of routing everything through a small number of large remote facilities. For enterprises running workflows that cannot tolerate delays, such as continuous monitoring systems, live customer interactions, and multi-step decision pipelines, the distance between a request and the compute handling it creates friction that adds up quickly across millions of daily transactions.
Whether the disaggregated model ultimately becomes a broader industry template depends on how well it scales beyond these initial deployments.

