Allen Institute for AI (AI2) has partnered with Contextual AI to introduce OLMoE, a model claimed to be performance and size-efficient.
The mixture-of-experts model has 1 billion active and 7 billion total parameters. Additionally, it was trained on 5 trillion tokens using a data mix based on AI2's Dolma and DataComp-Baseline. The model incorporates MoE details like routing algorithms, auxiliary loss functions, and sparse upcycling.
The model, along with its data, code, evaluations, logs, and intermediate training checkpoints, is freely available.
The institute has also released multiple variants and checkpoints of the model for future research. Additionally, AI2 is previewing its new Tulu 3 post-training pipeline, which incorporates additional instruction data and shows improvements in math, code, and instruction-following evaluations.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.