Cleanlab provides automated data curation software that helps identify and fix quality issues in datasets used for artificial intelligence (AI), large language models (LLMs), and analytics solutions. The company's technology uses confident learning, a method developed at MIT, to automatically detect and correct various data problems including mislabeled data, outliers, duplicates, and data drift across structured and unstructured datasets including text, images, and tabular data. The company offers two main products - Cleanlab Open Source, a free Python library for data scientists, and Cleanlab Studio, a cloud-based enterprise platform that provides a no-code interface for data curation workflows. In October 2023, Cleanlab launched its Trustworthy Language Model (TLM) feature that produces high-quality LLM outputs and adds trustworthiness reliability scores to LLM outputs. The platform automatically labels up to 90% of raw datasets on the first pass and identifies which portions contain no issues, helping companies reduce time spent on manual data quality improvements.
Key customers and partnerships
Over 10% of Fortune 500 companies use Cleanlab's solutions, including AWS, JPMorgan Chase, Google, Oracle, and Walmart. The company's technology is also utilized by startups like ByteDance, HuggingFace, and Databricks. In June 2023, Cleanlab partnered with Databricks to bring automatic data correction capabilities to the Databricks platform through Cleanlab Studio integration. Berkeley Research Group saved a legal client approximately USD 30 million in costs by using Cleanlab Studio to automatically improve legal document data. BBVA reported a 28% improvement in accuracy while reducing labeled transactions required for model training by over 98%.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.