LAVA: Long-horizon Visual Action based Food Acquisition

Amisha Bhaskar, Rui Liu, Vishnu D. Sharma, Guangyao Shi, Pratap Tokekar


Abstract


Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semi-solid and deformable foods. This study introduces Long-horizon Visual Action (LAVA) based food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of “clearing the bowl” by sequentially acquiring the food from the bowl. LAVA employs a hierarchical policy for long-horizon food acquisition tasks. The framework uses high-level policy to determine primitives by leveraging ScoopNet. At the mid-level, LAVA finds parameters for primitives using vision. To carry out sequential plans in the real world, LAVA delegates action execution driven by Low-level policy that uses parameters received from mid-level policy and behavior cloning ensuring precise trajectory execution. We validate our approach on complex real-world acquisition trials involving granular, liquid, semisolid, and deformable food types along with fruit chunks and soup acquisition.

Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89 ± 4%, and generalizes across realistic plate variations such as different positions, varieties, and amount of food in the bowl.



Method Overview
System Architecture of LAVA which employs a high-level policy (blue) πH to select amongst discrete high-level primitives PHk, such as wide primitive and Deep primitive, which then further gets refined by mid-level policy (green) πM to select amongst mid-level primitives PMk, low-level vision parametrized policy πL (brown) executes trajectory learned from Behavioral cloning for long-horizon dextrous food acquisition.

High-level Policy


Seeing the Big Picture

  • Decision-Making: Identifies food type and texture—choosing between gentle scooping for tofu or a direct approach for semi-solid foods.
  • Strategy: Sets the stage for action, ensuring adaptability and precision from the outset.

Mid-level Policy


Approach Refinement

TargetNet: Wide Primitives

  • Target Identification: Pinpoints the exact piece to acquire, crucial for executing wide primitive strategies.
  • Strategic Alignment: Decides the best approach between aligning food towards the center for easier access or leveraging the bowl's wall for support.

DepthNet: Deep Primitives

  • Depth Assessment: Measures the depth of food, guiding the scoop for deep primitives.
  • Trajectory Adjustment: Fine-tunes the scooping trajectory based on the assessed depth, optimizing scoop size and minimizing spillage.

Low-level Policy


Turning Plans into Action

  • Execution: Implements the refined strategy, directing the robot arm to scoop with targeted precision.
  • Adaptation: Learns from demonstrations, adjusting movements in real-time for efficient and careful food acquisition.

Quantitative Results


Time Efficiency Graph
Breakage Reduction Graph
Spillage Reduction Graph
Graph Legend

LAVA's innovative approach demonstrates remarkable efficiency and adaptability in robotic-assisted feeding:

  • Time Efficiency: Surpasses baselines, highlighting swift adaptation to food types and depths.
  • Minimized Breakage and Spillage: Precise handling significantly reduces food waste.
  • Exceptional Success Rate: Achieves superior scooping accuracy across a wide range of foods.
  • Robust Generalization: Excellently manages new, unseen food configurations, proving its adaptability.

These results affirm LAVA's potential to redefine the standards in RAF technology.

Success Rate Graph
Tofu 3 Graph

3 Tofu Configurations.

Tofu 4 Graph

4 Tofu Configurations.

Tofu 5 Graph

5 Tofu Configurations.

LAVA excels in clearing bowls with diverse food configurations, thanks to its advanced hierarchical policy. Outperforming baseline models, LAVA achieves unmatched efficiency and precision. It adeptly navigates complex tofu arrangements and cereal acquisitions, showcasing its robustness across food types. LAVA redefines efficiency in Robotic Assisted Feeding with its advanced, adaptive technology.

Cereal Acquisition

Cereal Acquisition

Zero-Shot Generalization


Zero-Shot Generalization

LAVA's design allows it to handle unseen food scenarios, demonstrating robust generalizability adeptly. This capability is pivotal for real-world applications, where unpredictability in food types and configurations is common. LAVA's ability to adapt and perform effectively without prior specific training on new food types or arrangements highlights its potential for widespread RAF technology adoption.

Zero-Shot Generalization with Tofu

Tofu in Soup: Adapting to floating pieces in fluid dynamics.

Zero-Shot Generalization with Fruit

Fruit Chunks: Handling variable shapes and avoiding spillage.

Project Contributors