All Updates

All Updates

icon
Filter
Product updates
Google Cloud Run adds NVIDIA GPU support for serverless AI inference
Generative AI Infrastructure
Aug 21, 2024
This week:
Funding
EKORE raises EUR 1.3 million (~ USD 1 million) in seed funding to strengthen platform
Digital Twin
Dec 20, 2024
Funding
Culina Health raises USD 7.9 million in Series A funding to expand offerings and expand team
Functional Nutrition
Dec 19, 2024
FDA approval
ViGeneron receives IND clearance for VG801 gene therapy
Cell & Gene Therapy
Dec 19, 2024
Product updates
Reflex Aerospace ships first commercial satellite SIGI
Next-gen Satellites
Dec 19, 2024
Partnerships
Vast partners with SpaceX for two private astronaut missions to ISS
Space Travel and Exploration Tech
Dec 19, 2024
Management news
Carbios appoints Philippe Pouletty as interim CEO amid plant delay
Waste Recovery & Management Tech
Dec 19, 2024
Funding
BlueQubit raises USD 10 million in seed funding to develop quantum platform
Quantum Computing
Dec 19, 2024
FDA approval
Arbor Biotechnologies receives FDA clearance for ABO-101 IND application
Human Gene Editing
Dec 19, 2024
Partnerships
Funding
Personalis partners with Merck and Moderna for cancer therapy development and investment
Precision Medicine
Dec 19, 2024
Partnerships
COTA partners with Guardant Health to develop clinicogenomic data solutions for cancer research
Precision Medicine
Dec 19, 2024
Generative AI Infrastructure

Generative AI Infrastructure

Aug 21, 2024

Google Cloud Run adds NVIDIA GPU support for serverless AI inference

Product updates

  • Google Cloud announced a preview of NVIDIA L4 GPU support for its Cloud Run serverless platform. The feature allows organizations to run serverless AI inference, with users only paying for the GPU resources they use.

  • The GPU-enabled Cloud Run instances can support various AI frameworks and models, including NVIDIA NIM, VLLM, Pytorch, and Ollama. Each instance can have one NVIDIA L4 GPU, providing up to 24 GB of vRAM. The service supports running models with up to 13 billion parameters for optimal performance.

  • Google Cloud claims this integration enables real-time inference with lightweight open models, serving custom fine-tuned GenAI models, and accelerates compute-intensive services. The company reports cold start times ranging from 11 to 35 seconds for various models, demonstrating the platform's responsiveness.

Contact us

Gain access to all industry hubs, market maps, research tools, and more
Get a demo
arrow
menuarrow

By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.