OctoML, an ML model optimization and deployment platform, has launched the latest iteration of its services, OctoAI. This self-optimizing infrastructure service is designed to assist companies in building and deploying AI applications, with a particular emphasis on generative AI applications.
OctoAI is a managed computing service that supports businesses in utilizing pre-existing open-source models and refining them using their own data to host personalized models. Users can easily prioritize their preferences, such as latency or cost, and OctoAI will automatically determine the appropriate hardware for their needs.
Moreover, the service automatically optimizes these models, resulting in additional cost savings and performance improvements. It also determines the most suitable platform for running the models, whether it be NVIDIA’S GPUs or AWS' Inferentia machines.
The new platform also provides access to a library of popular open-source large language models (LLMs), such as Stable Diffusion 2.1, Dolly v2, LLaMA 65B, Whisper, FlanUL, and Vicuna, which developers can use to build their AI applications.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.