In a breakthrough that underscores the rising influence of AMD’s AI hardware ecosystem, Zyphra has successfully developed ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on AMD Instinct™ MI300X GPUs and AMD Pensando™ networking, powered by the ROCm™ open software stack. This achievement marks a major milestone in frontier AI model training, demonstrating how AMD’s accelerated computing platform can rival, and in some cases outperform, incumbent AI training architectures.
“AMD leadership in accelerated computing is empowering innovators like Zyphra to push the boundaries of what’s possible in AI,” said Emad Barsoum, Corporate Vice President of AI and Engineering, Artificial Intelligence Group, AMD.
Designed for scale, ZAYA1 proves the efficiency of AMD hardware for production-grade AI workloads. With 192 GB of high-bandwidth memory, the MI300X GPU eliminated the need for expert or tensor sharding, simplifying training architectures and improving throughput. Zyphra also reported 10x faster model save times using AMD optimized distributed I/O, boosting reliability and operational efficiency in large-scale training.
ZAYA1-base, a model with 8.3B total parameters (760M active), delivered competitive or superior performance against industry benchmarks including Llama-3-8B, OLMoE, Qwen3-4B, and Gemma3-12B, showcasing powerful performance-to-efficiency ratios.
“Efficiency has always been a core guiding principle at Zyphra… ZAYA1 reflects this philosophy, and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform,” said Krithik Puthalath, CEO of Zyphra.
Co-designed with AMD and IBM, the training cluster leveraged MI300X GPUs with IBM Cloud’s high-performance storage and networking infrastructure, validating AMD’s scalability for multimodal and enterprise AI.
With this achievement, AMD strengthens its foothold in AI infrastructure, presenting a viable alternative to NVIDIA-dominated AI compute ecosystems—offering flexibility, efficiency, and frontier-scale capabilities.
