Meet Sohu: the world's first transformer ASIC

_Transformers etched into silicon

By burning the transformer architecture into our chips, we can run AI models an order of magnitude faster and cheaper than GPUs.

Llama 70B throughput

NVIDIA
8xH100

NVIDIA
8xB200

> 500,000

tokens/sec

Etched
8xSohu

Ingest thousands of words in milliseconds

Compare hundreds of responses in parallel

Generate new content in real-time

Tensor and expert parallelism

Fully open-source software stack

Expansible to 100T param models

Speculative decoding and parallel reasoning

144 GB HBM3E per chip

MoE and transformer variants