NVIDIA has released Llama-3.1-Minitron 4 billion, a compressed version of the Llama 3 model, to run on resource-constrained devices.
The model is developed by pruning and distilling the Llama 3.1 8 billion model, resulting in a 4-billion parameter version. Additionally, two versions of the model are available using depth-only pruning and width-only pruning.
The model is claimed to be capable of instruction following, roleplay, retrieval-augmented generation, and function-calling, offering a balance between training costs and inference capabilities.
The company claims that Llama-3.1-Minitron 4 billion performs comparably to other SLMs like Phi-2 2.7 billion, Gemma2 2.6 billion, and Qwen2-1.5 billion, despite being trained on a much smaller dataset.
Analyst QuickTake: There has been a growing trend of introducing smaller language models in the AI market. Last month, NVIDIA and Mistral AI released the Mistral NeMo 12B, a 12-billion-parameter multilingual language model. NVIDIA joins the likes of Microsoft, which introduced the Phi-3 Vision model (in May) and Phi-3 Mini (in April), OpenAI, which launched GPT-4o mini (in July), and Apple, which introduced OpenELM (in April).
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.