LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

Navigating robots through unstructured terrains is challenging, primarily due to the dynamic environmental changes. While humans adeptly navigate such terrains by using context from their observations, creating a similar context-aware navigation system for robots is difficult. The essence of the issue lies in the acquisition and interpretation of contextual information, a task complicated by the inherent ambiguity of human language. In this work, we introduce LANCAR , which addresses this issue by combining a context translator with reinforcement learning (RL) agents for context-aware locomotion. \ours{} allows robots to comprehend contextual information through Large Language Models (LLMs) sourced from human observers and convert this information into actionable contextual embeddings. These embeddings, combined with the robot's sensor data, provide a complete input for the RL agent's policy network. We provide an extensive evaluation of LANCAR under different levels of contextual ambiguity and compare with alternative methods. The experimental results showcase the superior generalizability and adaptability across different terrains. Notably, LANCAR shows at least a 7.4\% increase in episodic reward over the best alternatives, highlighting its potential to enhance robotic navigation in unstructured environments.

Method

Context-Aware Reinforcement Learning Robot Locomotion. Our framework introduces a context translator aside from the standard RL framework. For the environment with diverse terrains, the agent gets the explicitly observable state as the observation, and the human observer (or VLM) perceives the context information as the implicitly observable state. The human observer (or VLM) interprets the contextual information to the LLM translator. The LLM translator extracts the environmental properties from the con- textual information and generates the contextual embedding, which is concatenated with the observations as the input for RL agents. RL agents produce the action using their control policies given the context-aware inputs and execute the action in the environment.

Results

We perform evaluation experiments across all baselines and ablation studies over 10 cases (5 low-level context cases and 5 high-level context cases). ARS-based approaches achieve much higher episodic rewards than all other baselines. ARS using LANCAR embeddings for contextual information have a better performance than all other approaches in most cases

		Total	Rewards	in 10^3	(5000	steps)
Method	Backbone	A	B	C	D	E	F	G	H	I	J
	ARS	36.628	19.698	38.000	28.573	30.744	35.545	13.051	29.819	34.053	33.934
No-Contex	SAC	24.189	-10.128	15.5710	-10.839	-11.457	9.461	-7.123	-10.076	18.252	-3.994
	TD3	25.001	-6.756	17.768	-12.230	-11.726	9.833	-9.445	-12.450	19.352	-3.583
	PPO	7.542	-8.266	-1.249	-10.159	-10.073	4.534	-7.262	-10.637	15.798	-2.181
	ARS	36.659	23.435	38.366	20.649	22.952	37.791	16.265	22.776	36.676	35.257
LANCAR (Indexing)	SAC	16.423	-9.695	14.534	-12.199	-12.443	7.521	-7.592	-12.012	16.252	-5.815
	TD3	20.867	-7.665	15.734	-11.672	-11.612	7.955	-7.328	-13.414	17.089	-4.131
	PPO	24.119	-8.343	11.851	-8.520	-9.498	10.937	-10.980	-10.333	19.934	-2.009
	ARS	41.220	20.706	41.725	29.545	31.595	40.563	12.162	30.961	39.722	36.623
LANCAR(Embeddings)	SAC	12.154	-8.648	17.251	-9.413	-11.159	8.381	-7.197	-12.599	16.176	-5.970
	TD3	20.714	-8.655	17.788	-9.138	-11.022	8.587	-6.465	-12.478	15.772	-6.800
	PPO	12.979	-9.449	5.512	-9.187	-10.314	8.345	-9.533	-9.391	15.607	-8.148

Learning to Prompt for Vision-Language Models. Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu. IJCV 2022.

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments. Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel. CVPR 2018.

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control . Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich. 2023

LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

Abstract

Video

Method

Simulation

Low Friction High Elastic Ground

Results

Episodic Reward Curve

Related Works