All Updates

All Updates

icon
Filter
Product updates
Cerebras launches Cerebras Inference for improved AI model deployment
Generative AI Infrastructure
Aug 27, 2024
This week:
Partnerships
Microsoft and BlackRock partner to launch USD 30 billion AI data center investment fund
Machine Learning Infrastructure
Today
Funding
Limitless Labs raises USD 3 million in pre-seed funding to develop prediction market
Web3 Ecosystem
Today
Product updates
Google Cloud launches Blockchain RPC service for Web3 developers
Web3 Ecosystem
Today
Product updates
Kore.ai launches GALE platform for enterprise GenAI adoption
Machine Learning Infrastructure
Today
Product updates
Kore.ai launches GALE platform for enterprise GenAI adoption
Generative AI Infrastructure
Today
Product updates
ProAmpac launches enhanced online pouch configurator MAKR by DASL for custom flexible packaging prototypes
Smart Packaging Tech
Yesterday
Funding
M&A
Majority stake in Bollegraaf Group acquired by Summa Equity for EUR 800 million
Waste Recovery & Management Tech
Yesterday
Partnerships
NASA awards Intuitive Machines contract for near-space network services
Space Travel and Exploration Tech
Yesterday
Partnerships
FinFit partners with Sunny Day Fund to offer emergency savings accounts
Financial Wellness Tools
Yesterday
Partnerships
KSP partners with Peak Technologies and Locus Robotics for warehouse automation
Logistics Tech
Yesterday
Generative AI Infrastructure

Generative AI Infrastructure

Aug 27, 2024

Cerebras launches Cerebras Inference for improved AI model deployment

Product updates

  • Cerebras Systems has introduced Cerebras Inference, a new service for running AI models. The service offers speeds of 1,800 tokens per second for Llama 3.1 8 billion and 450 tokens per second for Llama 3.1 70 billion, with pricing starting at 10 cents per million tokens.

  • Cerebras Inference maintains 16-bit accuracy throughout the inference process, offering high performance without compromising accuracy. The service is available in three tiers: Free, Developer, and Enterprise. The Developer tier provides an API endpoint for flexible, serverless deployment, while the Enterprise tier offers fine-tuned models, custom SLAs, and dedicated support.

  • Cerebras claims its inference service is 20x faster than NVIDIA GPU-based solutions in hyperscale clouds and offers 100x higher price performance for AI workloads. The company claims this speed enables developers to build next-generation AI applications requiring complex, multi-step, real-time task performance, such as AI agents.

  • Analyst QuickTake: Cerebras Inference’s performance and lower costs threaten NVIDIA's solutions, potentially changing how AI inference is done. With nearly 20x the speed and much lower costs, it may appeal to developers and businesses looking for more efficient and affordable AI deployment options.

Contact us

Gain access to all industry hubs, market maps, research tools, and more
Get a demo
arrow
menuarrow

By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.