Outerport optimizes AI model deployment by addressing the "cold start" issue occurring when loading a model from disk into a machine. The issue is particularly pronounced when utilizing cloud GPUs, where the bandwidth between storage and CPU memory will likely be limited. The platform streamlines loading by managing model weights in CPU memory and enabling fast swaps to GPU memory using techniques like page locking and parallel processing. This approach minimizes transfer delays, cuts costs, and integrates existing inference pipelines.
The company uses a daemon process to automatically manage model loading and memory across storage, CPU, and GPU via gRPC communication. Using a simple API (outerport.load), it replaces traditional model-loading commands, supporting multiple GPUs and containers. By externalizing model loading and making state persistent, Outerport optimizes performance, enables efficient state management, and offers an interface to GPU and enterprise infrastructure, minimizing technical overhead.
Key customers and partnerships
As of October 2024, the company noted that its clients included those developing diffusion models for image generation.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.