+
+ 1University of Toronto, 2Ukrainian Catholic University
+
+
+
+---
+
+# Abstract
+
+Handovers, the exchange of objects between a giver and receiver, are a natural everyday task for humans. We do not even think that it involves active perception, prediction, reaction, and adjustment-making of both giver and receiver, which transforms handovers into a difficult task for robots. We propose using the Diffusion Policy[^1] to solve the mentioned challenges for human-to-robot (H2R) handovers. Diffusion models for policy learning have outstanding properties, such as an ability to express multimodal robot action distributions, learn from a small amount of data, predict out-of-training-distribution trajectories with smooth and reactive actions and, finally, stable training. Recent studies have made significant progress in H2R handover research but still have many downsides, including the use of two separate models for trajectory planning and predicting the moment when the gripper can be closed and the struggle with noticing human intentions to make a handover. In this thesis, we propose possible ways to solve these problems. Also, we present an approach to increasing the success rate of the Diffusion Policy by learning from its own successful results. Additionally, we propose two methods of generating training data for imitation learning of H2R handovers, as there are no publicly available datasets of robot-receiver trajectories for handovers.
+
+# Approach
+We propose to use a single model to predict all robot decisions for the handover: when to start the motion, how to move, and when to grasp the object. The used model, the Diffusion Policy[^1], learns to denoise Gaussian noise into the robot trajectory based on observations. In most experiments, we use the segmented point clouds as observations. By segmented, we mean a regular point cloud that contains two masks: one for the human hand and another for the object used in the handover.
+
+{::nomarkdown}
+
+
+
Diffusion Policy for robot trajectories in H2R handovers. During training, the encoder diffuses trajectories using a Markov chain process (See arrows from left to right), and the decoder gθ learns how to denoise those noise trajectories, also called latent variables xt (See arrows from right to left).
+
+
+
+
+
+
Diffusion Policy architecture.Conditions: Each segmented point cloud (contains masks that say if the point belongs to the hand or to the object) from the observation sequence (NO point clouds in total) is processed by PointNet++ to get embeddings; then, these embeddings are concatenated with each other and with the sinusoidal embedding of the denoising step t to be processed by the Feature-wise Linear Modulation (FiLM) module. This module is present in each layer of the U-Net. U-Net: Each block of the U-Net is a Conditional Residual Block, which processes the input by 1D convolution, executes FiLM modulation on it, processes the result by one more 1D convolution block, and adds it to the initial input value processed by another 1D convolution block. U-Net is called S times. which is the number of DDIM denoising steps.
+
+{:/}
+
+
+---
+
+# Environment
+
+All experiments are conducted in the HandoverSim environment[^2], which contains 900 scenes with human motions for H2R handovers of 18 different objects. The evaluation metrics are the success rate and time used to complete the task. Failure conditions are timeout, dropping the object and touching the hand. The HandoverSin splits scenes on train/val/test with 720, 36 and 144 scenes accordingly.
+
+{::nomarkdown}
+
+
+
+
+
+
+
Possible ends of the simulation. Left to right: Success, Timeout, Dropping the object, Touching the hand.
+
+{:/}
+
+Notice that successful grasp was not easy to complete because a big part of the object is occupied by human hand.
+
+---
+
+# Data
+The Diffusion Policy is an imitation learning algorithm that requires robot trajectories for training. HandoverSim contains only human trajectories. Therefore, we collect training data using two methods: with an RL expert and with an extended OMG Planner.
+
+## Data collected with the RL expert
+With the RL expert from the HandoverSim2Real study[^3], we collected 454 successful trajectories for training, where every trajectory starts after 1.5 seconds from the beginning of the simulation.
+
+It is important to mention that not all trajectories from the RL are good even if they are successful:
+
+{::nomarkdown}
+
+
+
+
+
Two successful scenes. Left scene has good trajectory and grasp, whereas the right one does not. However, both of them are successful in HandoverSim.
+
+{:/}
+
+Below we also show the example where a robot controlled by RL policy starts moving towards the object laid on the table instead of waiting for the human hand to express its intention to make a hadnover.
+
+{::nomarkdown}
+
+
+
+
No information about human intention. RL starts planning after 1.5s in the HandoverSim. If the human has not grasped the object yet, it starts moving towards the robot on the table.
+
+{:/}
+
+
+## Data collected with the extended OMG Planner
+Using the extended version of the OMG Planner[^4], we got 474 training trajectories, where every trajectory starts from the beginning of the simulation so that this dataset can be used for learning human handover intentions.
+
+---
+
+# Experiments and Results
+
+{::nomarkdown}
+
+
+
Data column explains what dataset was used for training: RL and OMG were described above, whereas
+Diff_RL mean successful trajectories of the diffusion policy on training scenes,
+2Diff_RL was also collected with the diffusion policy but from two different experiments.
+"+" sign means we mix datasets. Details are described in corresponding experiments.
+W (s) is the waiting time used at the beginning of the simulation; it is measured in seconds.
+Act represents if the action is the absolute pose of the end-effector (ee) or the change of its pose (Δee).
+Rot is a rotation representation (Euler angles or 6D).
+Norm is the type of normalization (regular or proposed in the diffusion policy paper).
+O, P, A horizons are observation, action and prediction horizons accordingly.
+GM is a grasp motion. If GM is ticked, the robot closes the gripper when its boolean value in the action is True.
+If GM is not ticked and when the boolean value in the action is True, the robot moves 8cm in the z-direction of the gripper and only
+then closes the gripper. PO describes if the position of the robot's end-effector is used as an observation. B is the batch size.
+
+{:/}
+
+## Experiments with the RL training data
+### Horizons experiments
+{::nomarkdown}
+
+
+
Experiments 1-5. Validation results for 10 runs.
+
+{:/}
+
+
+{::nomarkdown}
+
+
+
+
+
Human contact. There are two videos from HandoverSim validation scene number 20.
+The left video is the model from experiment 1, the right video is the RL policy.
+
+{:/}
+
+
+{::nomarkdown}
+
+
+
+
+
Timeout. Experiments with higher prediction horizon have higher timeout rat.
+There are two videos from HandoverSim validation scene number 14.
+The left video is the model from experiment 1 (P=4), the right video is the model from experiment 3 (P=8).
Same scene. Different failures. There are two videos from HandoverSim validation scene number 9.
+The left video is the model from experiment 6, the right video is the model from experiment 7.
+
+{:/}
+
+---
+
+### Actions type experiments
+{::nomarkdown}
+
+
+
Experiments 9-10. Validation results for 10 runs.
+
+{:/}
+
+{::nomarkdown}
+
+
+
+
+
Same scene. Different failures. There are two videos from HandoverSim validation scene number 9.
+The left video is the model from experiment 6, the right video is the model from experiment 7.
+
+{:/}
+
+
+
+# Diffusion policy self-improvement experiments
+
+
+
+# Bibliography
+
+[^1]: Cheng Chi et al. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion”. In: Proceedings of Robotics: Science and Systems (RSS). 2023. DOI: 10.48550/arXiv.2303.04137.
+[^2]: Yu-Wei Chao et al. “Handoversim: A simulation framework and benchmark for human-to-robot object handovers”. In: 2022 International Conference on Robotics and Automation (ICRA). IEEE. 2022, pp. 6941–6947. DOI: 10.48550/arXiv.2205.09747.
+[^3]: Sammy Christen et al. “Learning Human-to-Robot Handovers from Point Clouds”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 9654–9664. DOI: 10.48550/arXiv.2303.17592.
+[^4]: Lirui Wang, Yu Xiang, and Dieter Fox. “Manipulation trajectory optimization with online grasp synthesis and selection”. In: Robotics: Science and Systems (2020). DOI: 10.48550/arXiv.1911.10280.
\ No newline at end of file
diff --git a/assets/diff-h2r/contact.mp4 b/assets/diff-h2r/contact.mp4
new file mode 100644
index 0000000..a95d214
Binary files /dev/null and b/assets/diff-h2r/contact.mp4 differ
diff --git a/assets/diff-h2r/diff-h2r_architecture.png b/assets/diff-h2r/diff-h2r_architecture.png
new file mode 100644
index 0000000..aeb2909
Binary files /dev/null and b/assets/diff-h2r/diff-h2r_architecture.png differ
diff --git a/assets/diff-h2r/diffusion_in_handovers.png b/assets/diff-h2r/diffusion_in_handovers.png
new file mode 100644
index 0000000..4b6f8d5
Binary files /dev/null and b/assets/diff-h2r/diffusion_in_handovers.png differ
diff --git a/assets/diff-h2r/ee_traj/exp_10_ee_val_000.mp4 b/assets/diff-h2r/ee_traj/exp_10_ee_val_000.mp4
new file mode 100644
index 0000000..536cbe7
Binary files /dev/null and b/assets/diff-h2r/ee_traj/exp_10_ee_val_000.mp4 differ
diff --git a/assets/diff-h2r/ee_traj/exp_10_val_027.mp4 b/assets/diff-h2r/ee_traj/exp_10_val_027.mp4
new file mode 100644
index 0000000..0caf83e
Binary files /dev/null and b/assets/diff-h2r/ee_traj/exp_10_val_027.mp4 differ
diff --git a/assets/diff-h2r/exp7_val_009.mp4 b/assets/diff-h2r/exp7_val_009.mp4
new file mode 100644
index 0000000..3dce0ec
Binary files /dev/null and b/assets/diff-h2r/exp7_val_009.mp4 differ
diff --git a/assets/diff-h2r/exp_6_val_009.mp4 b/assets/diff-h2r/exp_6_val_009.mp4
new file mode 100644
index 0000000..b76647b
Binary files /dev/null and b/assets/diff-h2r/exp_6_val_009.mp4 differ
diff --git a/assets/diff-h2r/handover_sim/contact_071.mp4 b/assets/diff-h2r/handover_sim/contact_071.mp4
new file mode 100644
index 0000000..fed7145
Binary files /dev/null and b/assets/diff-h2r/handover_sim/contact_071.mp4 differ
diff --git a/assets/diff-h2r/handover_sim/difficult_grasp_008.mp4 b/assets/diff-h2r/handover_sim/difficult_grasp_008.mp4
new file mode 100644
index 0000000..e6b9781
Binary files /dev/null and b/assets/diff-h2r/handover_sim/difficult_grasp_008.mp4 differ
diff --git a/assets/diff-h2r/handover_sim/drop_000.mp4 b/assets/diff-h2r/handover_sim/drop_000.mp4
new file mode 100644
index 0000000..ad838c3
Binary files /dev/null and b/assets/diff-h2r/handover_sim/drop_000.mp4 differ
diff --git a/assets/diff-h2r/handover_sim/timeout_006.mp4 b/assets/diff-h2r/handover_sim/timeout_006.mp4
new file mode 100644
index 0000000..b23695e
Binary files /dev/null and b/assets/diff-h2r/handover_sim/timeout_006.mp4 differ
diff --git a/assets/diff-h2r/human_coll/exp1_val_020.mp4 b/assets/diff-h2r/human_coll/exp1_val_020.mp4
new file mode 100644
index 0000000..501f2d4
Binary files /dev/null and b/assets/diff-h2r/human_coll/exp1_val_020.mp4 differ
diff --git a/assets/diff-h2r/human_coll/rl_val_020.mp4 b/assets/diff-h2r/human_coll/rl_val_020.mp4
new file mode 100644
index 0000000..9615db9
Binary files /dev/null and b/assets/diff-h2r/human_coll/rl_val_020.mp4 differ
diff --git a/assets/diff-h2r/rl_data_videos/bad_traj_and_grasp_001.mp4 b/assets/diff-h2r/rl_data_videos/bad_traj_and_grasp_001.mp4
new file mode 100644
index 0000000..15c6af8
Binary files /dev/null and b/assets/diff-h2r/rl_data_videos/bad_traj_and_grasp_001.mp4 differ
diff --git a/assets/diff-h2r/rl_data_videos/good_015.mp4 b/assets/diff-h2r/rl_data_videos/good_015.mp4
new file mode 100644
index 0000000..2b8ad90
Binary files /dev/null and b/assets/diff-h2r/rl_data_videos/good_015.mp4 differ
diff --git a/assets/diff-h2r/rl_data_videos/no_intention_007.mp4 b/assets/diff-h2r/rl_data_videos/no_intention_007.mp4
new file mode 100644
index 0000000..1743a58
Binary files /dev/null and b/assets/diff-h2r/rl_data_videos/no_intention_007.mp4 differ
diff --git a/assets/diff-h2r/tables/experiment_overview.png b/assets/diff-h2r/tables/experiment_overview.png
new file mode 100644
index 0000000..a1943e4
Binary files /dev/null and b/assets/diff-h2r/tables/experiment_overview.png differ
diff --git a/assets/diff-h2r/tables/validation_1-5.png b/assets/diff-h2r/tables/validation_1-5.png
new file mode 100644
index 0000000..15379ff
Binary files /dev/null and b/assets/diff-h2r/tables/validation_1-5.png differ
diff --git a/assets/diff-h2r/tables/validation_6-8.png b/assets/diff-h2r/tables/validation_6-8.png
new file mode 100644
index 0000000..3fcc417
Binary files /dev/null and b/assets/diff-h2r/tables/validation_6-8.png differ
diff --git a/assets/diff-h2r/tables/validation_9-10.png b/assets/diff-h2r/tables/validation_9-10.png
new file mode 100644
index 0000000..56a4a4b
Binary files /dev/null and b/assets/diff-h2r/tables/validation_9-10.png differ
diff --git a/assets/diff-h2r/timeout/exp_1_val_014.mp4 b/assets/diff-h2r/timeout/exp_1_val_014.mp4
new file mode 100644
index 0000000..778b8a3
Binary files /dev/null and b/assets/diff-h2r/timeout/exp_1_val_014.mp4 differ
diff --git a/assets/diff-h2r/timeout/exp_3_val_014.mp4 b/assets/diff-h2r/timeout/exp_3_val_014.mp4
new file mode 100644
index 0000000..9212efa
Binary files /dev/null and b/assets/diff-h2r/timeout/exp_3_val_014.mp4 differ