Alibaba Cloud has launched “Qwen2-VL,” an advanced vision-language AI model to enhance visual understanding, video comprehension, and multilingual text-image processing.
The model is available in 72 billion, 7 billion, and 2 billion parameters. Additionally, it can analyze and describe handwriting in multiple languages, identify objects in images, analyze live video in near real-time, and analyze videos longer than 20 minutes while answering questions about content. Furthermore, it supports English, Chinese, most European languages, and Japanese, Korean, Arabic, and Vietnamese.
The company claims the model can be integrated into mobile phones and robots, allowing automated operations based on visual environments and text instructions. It also supports function calling, enabling integration with third-party software and visual extraction of information from external sources.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.