diff --git a/sig-robotics/proposal/Offline Visual Navigation without Metric Map/Offline Visual Navigation without Metric Map.md b/sig-robotics/proposal/Offline Visual Navigation without Metric Map/Offline Visual Navigation without Metric Map.md index e6e852b..2f39346 100644 --- a/sig-robotics/proposal/Offline Visual Navigation without Metric Map/Offline Visual Navigation without Metric Map.md +++ b/sig-robotics/proposal/Offline Visual Navigation without Metric Map/Offline Visual Navigation without Metric Map.md @@ -3,7 +3,8 @@ ## Background Existing learning-based navigation studies mainly develop the algorithms in simulation environment that often suffer from **sim-to-real gap** issue when transferring to the real world. Recently, there is some work dedicated to lift the simulation assumptions by learning directly from offline real-world dataset that are not necessarily collected from experts, such as ViNG [1], ReViND [2], and ViNT [3], which use either imitation learning (IL) or offline reinforcement learning (ORL). This route been proven to achieve broader generalization for executing real-world tasks and thereby step towards generic AI agents. Moreover, the efforts in navigation tasks integrate offline learned policy with an image-node neural topological SLAM [4] for handling long-horizon reasoning that pure RL policy cannot tackle, which has a significance value for replacing the conventional metric maps that are hard to store and update. Therefore, this work aims to follow this technical route and explore how to achieve a **robust** AI agent in the navigation tasks **without sim-to-real gap** and **without metric maps**. -Through real-world testing, we observe that due to the limited state distribution in the offline dataset, when facing the **out-of-distribution (OOD)** observations induced by illumination and scenario changes and the accumulative errors, the localization loss issue often arises that the robot fails to localize itself on a given map. In this case, the robot may get stuck or make infeasible decision, which significantly decreases the navigation success rates and even results in collision. And this often requires tedious human monitoring and correction. To address this issue, our work poses the question and takes one step forward:’ can we leverage the prior knowledge from offline data to guide the robot to **self-correct** the trajectory autonomously without human intervention or specific data collection?’ To this end, our key idea is learning from offline dataset to predict whether future trajectories lie in the prior distribution, and plan the future trajectories that can drive the robot to the familiar places autonomously. Real-world experiments show our method achieves +26% average success rates and x2 longer the average distance until intervention than the baselines (ViNG and ReViND), proving that our method can significantly reduce the human intervention for practical applications. This project has been accepted by 2024 ICRA, see paper: https://arxiv.org/abs/2404.10675. +Through real-world testing, we observe that due to the limited state distribution in the offline dataset, when facing the **out-of-distribution (OOD)** observations induced by illumination and scenario changes and the accumulative errors, the localization loss issue often arises that the robot fails to localize itself on a given map. In this case, the robot may get stuck or make infeasible decision, which significantly decreases the navigation success rates and even results in collision. And this often requires tedious human monitoring and correction. To address this issue, our work poses the question and takes one step forward:’ can we leverage the prior knowledge from offline data to guide the robot to **self-correct** the trajectory autonomously without human intervention or specific data collection?’ To this end, our key idea is learning from offline dataset to predict whether future trajectories lie in the prior distribution, and plan the future trajectories that can drive the robot to the familiar places autonomously. Real-world experiments show our method achieves +26% average success rates and x2 longer the average distance until intervention than the baselines (ViNG and ReViND), proving that our method can significantly reduce the human intervention for practical applications. This project, called "*SCALE: Self-Correcting Visual Navigation for Mobile Robots via +Anti-Novelty Estimation*", has been accepted by 2024 ICRA, see paper: https://arxiv.org/abs/2404.10675. ## Goals @@ -41,11 +42,18 @@ The framework is shown in Fig. 1, which consists of 1) an image-goal conditioned - **Image-goal conditioned visual navigation**: we use implicit Q-learning (IQL) [6], an offline reinforcement learning method to learn the value function and policy from offline dataset. In practice, we find two techniques useful for successfully learning the value functions from purely visual input: 1) *negative sampling* is necessary for value learning with paired image inputs, where we assign image pairs that are separated less than or equal to a threshold of timesteps $d_{max}$ as the positive samples $B_+$ and those separated beyond the threshold are assigned as the negative samples $B_−$. 2) *relative goal embedding*, the difference between goal and current image embeddings, *i.e.*, $∆z_{g,t} = z_g – z_t$, is more effective than directly using goal embedding $z_g$ as the input for the goalconditioned networks to perceive the goal orientation. - **Self-correction for localization recovery**: we self-supervisedly learn an affordance model from offline dataset for generating potential future trajectories in the latent space. On the one hand, to predict multi-step trajectories, we use conditional generation model to learn the latent space with forward-inverse cycle consistency (FICC). On the other hand, we take advantage of the surrounding pixels in history frames using recurrent neural network (RNN) to predict some aggressive trajectories that are largely beyond the current field-of-view. Furthermore, we learn a novelty estimator via random network distillation (RND) [7] technique to evaluate the predicted future states. Intuitively, the reachability and aggressiveness facilitate the reasonability and diversity, while the anti-novelty strategy can induce the robot to the familiar places. - **Neural topological SLAM**: we construct a topological map in the environment where each node is a previously observed image and each edge refers to the traversability between two nodes that is estimated by the learned value function. The localization process is conducted by searching the highest value prediction between current image and all the images on the map. An example of the integration of navigation, neural topological SLAM, and localization recovery module is illustrated in Fig.2. - ![topological_navigation](./images/topological_navigation.png "topological_navigation" ) +
+architecture +
+ **Fig. 2. Topological navigation with localization recovery**. SCALE combines the topological visual navigation with a novel localization recovery module. - + + + +