Cohere for AI and Cohere have developed a new technique for synthetic data generation known as "active inheritance." This technique aims to guide synthetic data toward certain objectives beneficial to machine-learning models.
The prime feature of the "active inheritance" method is its active steering of synthetic data generation towards specific beneficial attributes, like high lexical diversity and reduced toxicity. It involves selecting proxy labels based on desired characteristics, generating several samples for each prompt, and choosing the most suitable sample. The targeted sampling process enables machine-learning models to be fine-tuned using synthetic data specifically designed to optimize these attributes.
Cohere claims that using "active inheritance" can greatly improve the effectiveness and safety of the resulting machine-learning models. Improvements witnessed include an up to 116% increase in length and a 43% rise in linguistic diversity, while toxicity dropped by as much as 40%. These results show potential for embedding precise attributes into a model with minimal effort, assuming optimally designed synthetic data.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.