ModeNet
ModeNet: Dynamic Mode Classification
- Decision-Making: Identifies when to switch between Motion Planning and interaction modes based on input observations.
- Adaptability: Enables the system to dynamically adapt its strategy, ensuring efficient task execution.


InteractNet
InteractNet: Precise Manipulation
- Execution: Executes fine-grained manipulation tasks with precision, guided by learned RL policies.
- Adaptation: Learns from demonstrations and adjusts movements in real-time for efficient task completion.

Simulation Results

Assembly

BoxClose

CoffeePush













Real-World Experiments
PLANRL: Lift Env Training
For this setup we use only wrist-camera for BC policy, whereas for predicting waypoints both wrist and environment cameras are used.
Total Training Time: 40 minutes

Data Collection using Teleoperation: Lift [No Randomization]



Steps: 2k Time: 10mins

Steps: 4k Time: 20mins

Steps: 6k Time: 30mins

Steps: 8k Time: 40mins
PLANRL: Pick and Place Env Training
This setup uses both wrist-camera and environment camera for BC policy and waypoint prediction. This task has 2 stages "pick" and "place" and it involves 3 waypoints as discussed in the paper.
Total Training Time: 3 hours

Data Collection using Teleoperation: Pick and Place



Steps: 2k Time: 10mins

Steps: 10k Time: 50mins

Steps: 16k Time: 90mins

Steps: 26k Time: 130mins