Meta announced the details of its new GPU clusters for advanced AI model training, such as the Llama 3. These clusters are designed to handle larger and more demanding models than their previous versions.
Each new cluster is equipped with 24,576 NVIDIA H100 GPUs, a significant upgrade from the previous 16,000 NVIDIA A100 GPUs. The clusters incorporate different network fabric solutions, including RDMA over converged Ethernet (RoCE) based on Arista 7800 and NVIDIA Quantum2 InfiniBand fabric.
Furthermore, these clusters utilize Meta's proprietary open GPU hardware platform, Grand Teton, designed specifically for large-scale applications.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.