FuriosaAI, an AI semiconductor company, has launched RNGD, an AI accelerator chip for data center inference. The chip is designed for high-performance LLM and multimodal model inference.
RNGD features a non-matmul Tensor Contraction Processor (TCP)-based architecture, a robust compiler optimized for TCP, and 48 GB of HBM3 memory. The chip has a TDP of 150 W compared to 1,000+ W for leading GPUs and can deliver 2,000 to 3,000 tokens per second throughput performance for models with around 10 billion parameters.
RNGD is currently sampling to early access customers, with broader availability expected in early 2025. FuriosaAI claims that RNGD offers a perfect balance of efficiency, programmability, and performance. The company states that RNGD is a sustainable and accessible AI computing solution that meets the industry's real-world needs for inference, with the ability to run models like Llama 3.1 8 billion efficiently on a single card.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.