- Use anaconda to create a Python 3.8 environment:
conda create -n habitat python3.8
conda activate habitat
- Install Habitat-Sim 0.2.1:
conda install -c aihabitat -c conda-forge habitat-sim=0.2.1 headless
- Install Habitat-Lab 0.2.1:
git clone --branch v0.2.1 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
# installs both habitat and habitat_baselines
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all
- Clone this repository and install python requirements:
git clone https://github.com/RavenKiller/MEE.git
cd MEE
pip install -r requirements.txt
- Download Matterport3D scenes:
- Get the official
download_mp.py
from Matterport3D project webpage - Download scene data for Habitat
# requires running with python 2.7 python download_mp.py --task habitat -o data/scene_datasets/mp3d/
- Extract such that it has the form
data/scene_datasets/mp3d/{scene}/{scene}.glb
. There should be 90 scenes.
- Get the official
- Download pre-processed episodes from here. Extract it into
data/datasets/
. - Download the depth encoder from here. Extract the model to
data/ddppo-models/gibson-4plus-resnet50.pth
.
We proposed an evolutionary pre-training strategy in this work and developed the corresponding datasets. The data collecting scripts are stored in scripts/
with filenames like evo_data_stage1.ipynb
.
The v1 version (default access code: evop
) contains a total of 4.8M samples of all modalities. All data is organized in HDF5 format. The total size after decompression is around 720 GB. Below is the file list:
- stage1.zip
- rgb.mat: contains RGB data with shape (395439, 224, 224, 3)
- depth.mat: contains depth data with shape (417900, 256, 256, 1)
- inst.mat: contains instruction data with shape (400250, 77), zero-padded, and tokenized
- sub.mat: contains sub-instruction data with shape (410357, 12, 77)
- stage2.zip
- rgb_depth_large.mat: contains aligned RGB and depth data, a total of 230766 pairs
- inst_sub_large.mat: contains aligned instruction and sub-instruction data, a total of 157877 pairs
- rgb_depth.mat: contains a small debug version
- inst_sub.mat: contains a small debug version
- stage3.zip
- data.mat: contains aligned (RGB, depth, instruction, sub-instruction), a total of 601038 tuples
The data source includes:
- stage 1: COCO, VisualGenome, RGBD1K, SceneNet Depth, and BookCorpus.
- stage 2: NYUv2, DIODE, TUM RGB-D, Bonn RGB-D Dynamic, SceneNet RGB-D,Touchdown, map2seq, CHALET, Talk the Walk, and ALFRED.
- stage 3: VLN-CE and EnvDrop.
The v2 version contains a total of 83.9M samples of all modalities, which is a superset of v1. All data are stored in seperated files (RGB: JPEG, Depth: PNG, Instruction: TXT, Sub-instruction: TXT). Collecting and loading scripts are developed in the dev branch.
Additional data sources: ImageNet, LAION-HighResolution, CC-12M, C4, HM3D, SUN3D, ScanNet, Marky-gibson.
Access of several datasets are subject to specific terms and conditions (e.g., HM3D). Please request the access before using them.
run.py
is the program entrance. You can run it like this:
python run.py \
--exp-config {config} \
--run-type {type}
{config}
should be replaced by a config file path; {type}
should be train
, eval
, or inference
, meaning train, evaluate, and test models.
Our config files are stored in evoenc/config/
:
File | Meaning |
---|---|
evoenc.yaml |
Training model with behavior cloning |
evoenc_da.yaml |
Training model with DAgger |
evoenc_aug.yaml |
Training model with EnvDrop |
evoenc_p{x}.yaml |
Evolutionary pre-training stage {x}+1 |
evoenc_p{x}_tune.yaml |
Task fine-tuning with DAgger |
Several paths (like pre-training data folder and checkpoint paths) are configured by the above YAML files or the evoenc/config/default.py
. Remember to change them as needed.
We release pre-trained encoder weights after evolutionary pre-training. We exclude the frozen pre-extractor in these weights to reduce the storage cost. Refer to the code evoenc/models/evoenc_policy.py
to load pre-trained weights.
navigation_vlnce.mp4
Premature stop
premature_stop.mp4
Wrong exploration
wrong_exploration.mp4
Deadlock
deadlock.mp4
Alkaid is a self-developed interactive service robot. Here are some parameters:
- Camera: 720P resolution, 90° max FOV
- Screen: 1080P, touch screen
- Microphone: 4-microphone circular array, 61dB SNR
- Speaker: 2 stereo units, 150Hz-20kHz output
- Chassis: 2-wheel differential drive, 0.5m/s max speed, 1.2rad/s max angular speed
Currently, we release 13 paths with VLN-CE format. The video below demonstrates 4 paths.