Tech experts aim to integrate robots into offices and indoor environments, but this requires advanced artificial intelligence to ensure these robots are aware of their surroundings. Enter Google Gemini, which is poised to be a game-changer.
Google DeepMind’s recent arXiv paper describes the implementation of Gemini 1.5 Pro, enabling a robot to follow commands and navigate an office setting.
With Gemini, the robot can now lead researchers to the nearest power outlet and remember item locations, though these capabilities are still basic and need further refinement for practical use.
How Did Google Gemini Enhance Robot Navigation?
TechCrunch reports that researchers conducted a specialized guided tour, known as “Multimodal Instruction Navigation with demonstration Tours (MINT).” This involved walking the robot around the office and using speech to highlight various landmarks.
Through this process, the AI maps the indoor environment using its cameras. Researchers then taught Google Gemini to convert user requests into navigational instructions using a hierarchical Vision-Language-Action (VLA) navigation policy, combining environment understanding with common-sense reasoning.
Digital Trends notes that the results were impressive, with the Google Gemini bot achieving “86% and 90% end-to-end success rates on previously infeasible navigation tasks involving complex reasoning and multimodal user instructions in a large real-world environment.”
The robot successfully guided researchers to power outlets, recalled soda locations, and directed them to the DeepMind office whiteboard.
However, Google DeepMind acknowledges that the robot is still somewhat clumsy, needing assistance during office tours and taking 10 to 30 seconds to respond. It may be several years before Gemini-powered bots can perform everyday tasks independently.
