Mistral AI, an open-source developer of foundation models, has released Pixtral 12 Billion, its first multimodal AI model with language and vision processing capabilities.
The model allows users to analyze images while combining text prompts with them. Its vision encoder supports 1024×1024 image resolution with 24 hidden layers for image processing. Additionally, the model features a 24 GB architecture with 40 layers, 14,336 hidden dimension sizes, and 32 attention heads.
Analyst QuickTake: The company has released several models during the past few months. In July 2024, it introduced 1) Mistral Large 2 , a model supporting a diverse range of languages and coding languages, 2) Mistral NeMo 12 billion, a new 12 billion parameter multilingual small language model (SML), and 3) Codestral Mamba and Mathstral for code generation and math-related reasoning, respectively.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.