Frontier Inference Clusters

Inside an Etched rack — liquid-cooled compute

We're building a new category of AI hardware: frontier inference clusters.

We co-design chips, racks, software, and manufacturing methods so frontier models can run with best-in-class throughput, latency, cost, and power efficiency for both prefill and decode workloads.

Earlier this year our A0 silicon came back from TSMC N4P, and today we are busy validating our first rack-scale product with customers to fulfill $1B in demand.

We're a team of 400+ engineers from NVIDIA, Google TPUs, Broadcom, SK Hynix, TSMC, and more. We've raised $800M across four unannounced financings, including a strategic investment from VentureTech Alliance. We're excited to deepen our partnership with the world's leading semiconductor manufacturer.

Designing a New Pareto Frontier

Our inference systems are built to push the entire pareto curve on frontier models, including many-trillion-parameter MoEs, long context, and agentic workloads. This required intense co-design, from new chips, packages, PCBs, cold plates, interconnects, and more. Today, we're sharing two breakthroughs to make this happen:

Low Voltage Inference (LVI) for high-throughput workloads

Today, AI chips can't scale FLOPs without thermal throttling. As FLOPs utilization increases, AI chips draw more power and downregulate clock speed. This often results in sustained inference throughput under half of Peak FLOPs.

We've designed a new architecture to run our chip's math blocks at under half the voltage of most AI chips. This enables multiple times the FLOPs density of AI chips today. We can run trillion-parameter sparse MoEs at 80%+ Peak FLOPs without thermal throttling.

Running LVI requires co-designing the entire cluster from the transistor to the token: new splittable math arrays, circuit techniques, novel tiling and scheduling algorithms, power delivery networks, VRM architectures, advanced packaging, cold plate designs, and more.

Cluster Scale Memory (CSM) for low-latency workloads

Today's AI chips using HBM can't achieve SRAM-level decode speeds due to memory subsystem and interconnect bottlenecks. SRAM-only chips have lower FLOPs density and memory capacity, sacrificing throughput.

We created a much lower-latency shared memory pool across our scale-up domain. We use a proprietary ultra-low-latency, high-bandwidth interconnect to enable dramatically faster memory access across chips.

Our HBM/SRAM hybrid design solves both memory capacity and mem2mem latency, enabling high throughput and interactivity simultaneously. CSM improves latency and avoids today's cost, reliability, yield, thermal, and compute tradeoffs of SRAM-only chips, 3D DRAM chips, or optics.

We've made co-design decisions hand-in-hand with leading AI companies, cloud providers, and hyperscalers. We've tested racks in representative data center deployments, run terabytes of production traffic patterns through our simulator, and had dozens of engineers live overseas for months to co-design deeply with our supply-chain partners. If this sounds exciting, you should join us.

Getting to Gigawatt Scale

Early customer tests show us achieving SOTA throughput, latency, and power efficiency on inference workloads. We'll be sharing more updates on our performance and roadmap this summer.

Our first racks ship this summer, and we've kicked off production to fulfill over $1B in customer contracts. To enable 24/7 engineering cycles, we've opened a Taiwan factory and built a data center, test house, and NPI prototyping lab in our San Jose office.

We are vertically integrated to get to Gigawatt scale as quickly as possible. Math block designers sit next to inference engineers, thermal experts next to GSMs.

Gavin Uberti

Co-Founder & CEO

Harvard Thiel Fellow. World math champion, Math 55 alumnus, expert in AI compilers, developed the Cortex-M backend for TVM.

Robert Wachen

Co-Founder & President

Harvard Thiel Fellow. Co-founded Prod ($100B+ cohort valuation) and Mentor Labs (acq. by Crimson Education).

Mark Ross

CTO

Ex-CTO of Cypress (acquired for $9.4B). Shipped 5 systems generating >$1B in revenue, all on A0 silicon.

Brian Loiler

VP of Platform

Ex-NVIDIA for 22 years. Led platform engineering teams across NVIDIA. Built the HGX and DGX systems from scratch.

Wayne Cao

VP of Production

Led 0-1 production & supply chain ramps for 24 products including the original iPhone, MacBook Air, Pixel, and Chromebook.

Saptadeep Pal

VP of ASIC & Architecture

Co-founded Auradine. NVIDIA H100/A100/V100 architecture team. Qualcomm award for research on waferscale SRAM & DRAM stacking.

David Munday

VP of Software

Built the TPU software team (TPU v1-v5) and led research for Project Astra at Deepmind.

Tim Perevozchikov

VP of Finance

Ex-VP Quant Trading & Chief of Staff to CEO at Two Sigma Securities. Built multiple new trading desks from scratch.

Ajat Hukkoo

Distinguished Engineer at Broadcom & VP of Intel's Custom Silicon Group. Shipped 300 million chips across nine A0 products.

Chris Zhu

Co-Founder

Harvard Thiel Fellow. Math and high performance computing researcher, Math 55 alumnus. Published novel combinatorics work.

Build the future of inference

We’re grateful for the support of our investors, angels, and advisors, including:

Tri Dao

FlashAttention

Geoffrey Hinton

Godfather of AI

Peter Thiel

Thiel Capital

Jerry Tworek

Core Auto / OpenAI

Andrej Karpathy

Anthropic

Aidan Gomez

Cohere

Arthur Mensch

Mistral

Noam Brown

OpenAI

Scott Wu

Cognition

Fei-Fei Li

World Labs

Tri Dao

FlashAttention

Geoffrey Hinton

Godfather of AI

Peter Thiel

Thiel Capital

Jerry Tworek

Core Auto / OpenAI

Andrej Karpathy

Anthropic

Aidan Gomez

Cohere

Arthur Mensch

Mistral

Noam Brown

OpenAI

Scott Wu

Cognition

Fei-Fei Li

World Labs

Tal Broda

OpenAI

Stanley Druckenmiller

Duquesne

Ben Spector

Flapping Airplanes

Irwan Bello

Reflection AI

Pieter Abbeel

Covariant

Shivon Zilis

Neuralink

Amjad Masad

Replit

Jason Warner

Poolside

Zach Dell

Base Power

Kyle Vogt

The Bot Company

Scott Belsky

A24

Lachy Groom

Physical Intelligence

Nikesh Arora

Palo Alto Networks

Bryan Johnson

Blueprint

Karim Atiyeh

Ramp

Dylan Field

Figma

Stefano Ermon

Inception

Yash Patil

Applied Compute

Jesse Zhang

Decagon

Nikita Bier

Etched

Frontier Inference Clusters

We're building a new category of AI hardware: frontier inference clusters.

Designing a New Pareto Frontier

Low Voltage Inference (LVI) for high-throughput workloads

Cluster Scale Memory (CSM) for low-latency workloads

Getting to Gigawatt Scale