Waymo, the self-driving tech development unit of Alphabet, has introduced EMMA (End-to-End Multimodal Model for Autonomous Driving), a new training model built using Google's multimodal large language model, Gemini. The model is currently in the research phase and focuses on helping robotaxis make driving decisions.
EMMA processes sensor data and uses chain-of-thought reasoning to generate future trajectories for autonomous vehicles. The model is designed to handle multiple tasks, including trajectory prediction, object detection, and road graph understanding. It can also process raw camera inputs and textual data to generate various driving outputs.
According to Waymo, EMMA has improved performance in end-to-end planning by 6.7% and can provide an interpretable rationale for driving decisions. However, the model has limitations, including the inability to process long-term video sequences, incorporate 3D sensor inputs from LiDAR or radar, and require further research before deployment.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.