Meta has introduced Chameleon, an early-fusion multimodal AI model. The model is currently in preview and has not been officially released.
The Chameleon models claim to be proficient in image captioning and visual question answering and retain competitive roles in text-only tasks.
The model operates with an "early-fusion token-based mixed-modal" architecture, built to learn from a combined mixture of images, text, and code among others. Additionally, it delivers encoding and decoding in a unified token space, allowing the model to generate and reason over sequences that include text and images without requiring modality-specific components.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.