Illustrated Landmark Graphs for Long-Horizons Policy Learning

Jan 1, 1010·

Christopher Watson

Arjun Krishna

Rajeev Alur

Dinesh Jayaraman

· 0 min read

Cite OpenReview

Abstract

Applying learning-based approaches to long-horizon sequential decision-making tasks requires a human teacher to carefully craft reward functions or curate demonstrations to elicit desired behaviors. To simplify this, we first introduce an alternative form of task-specification, Illustrated Landmark Graph (ILG), that represents the task as a directed-acyclic graph where each vertex corresponds to a region of the state space (a landmark), and each edge represents an easier to achieve sub-task. A landmark in the ILG is conveyed to the agent through a few illustrative examples grounded in the agent’s observation space. Second, we propose ILG-Learn, a human in the loop algorithm that interleaves planning over the ILG and sub-task policy learning. ILG-Learn adaptively plans through the ILG by relying on the human teacher’s feedback to estimate the success rates of learned policies. We conduct experiments on long-horizon block stacking and point maze navigation tasks, and find that our approach achieves considerably higher success rates (~ 50% improvement) compared to hierarchical reinforcement learning and imitation learning baselines. Additionally, we highlight how the flexibility of the ILG specification allows the agent to learn a sequence of sub-tasks that is better suited to its limited capabilities.

Type

Publication

TMLR

Last updated on Jan 1, 1010

← Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model Jan 1, 1010

Learning to Achieve Goals with Belief State Transformers Jan 1, 1010 →