Neural-network-based depth estimation from RGB vision-based tactile sensor GelSlim 4.0
For all functionality associated with the GelSlim 4.0, visit the project website!
-
Install Dependencies
pip install -r requirements.txt
-
Clone
gelslim_depth
with git to create the root directory -
Install
gelslim_depth
(run ingelslim_depth
root directory)
pip install -e .
Alongside this package we provide a compatible dataset consisting of .pt
PyTorch tensor dictionaries for quick data loading. These are organized into directories, pre-split for training, validation, and testing purposes.
train_data
: Very largen_samples
(1000–3000) dictionaries, informs the training of the depth estimator networkvalidation_data
: Smallern_samples
(100–300) dictionaries, informs the selection of the best network weights in training outputtest_data
: Smallern_samples
(100–300) dictionaries, unseen during training, but still available for comparison with ground truthreal_data
: Even smallern_samples
(40–100) dictionaries, which may or may not have paired depth images or corresponding mesh files
All dictionaries have the following keys relevant to this package (other keys may exist, ignore them) with all keys aligned along the n_samples
dimension where applicable.
tactile_image
: An_samples x 6 x 320 x 427
tensor containingn_samples
RGB GelSlim images. The 6 channels correspond to Left GelSlim RGB followed by Right GelSlim RGB.depth_images
: An_samples x 2 x 320 x 427
tensor containingn_samples
depth images with each pixel value as distance from the camera in mm, where the undeformed image is 0. This means all pixel values should be negative. These are the 'ground truth' used in training and were generated usingscripts/data_scripts/generate_depth.py
. The two channels correspond to Left Depth and Right Depth.in_hand_pose
: An_samples x 3
tensor containingn_samples
in_hand_poses, the SE(2) or planar pose at which the object frame (the coordinate frame for the corresponding mesh file) sits in the grasp_frame, with the form (y (mm), z(mm), theta(rad))base_tactile_image
: A1 x 6 x 320 x 427
orn_samples x 6 x 320 x 427
tensor representing the undeformed image. Some of the datasets were collected with just one undeformed image and some were collected with one for each new grasp.grasp_widths
: An_samples
tensor listing the grasp widths (in mm) if this was recorded for the dataset. This is used with the mesh file to generate the ground truth depth images. For objects where this was not recorded, a single grasp width to be used for all data points is listed ingrasp_widths.txt
.
Also with this dataset are some txt files for configuration:
test_objects.txt
: Objects listed here will be entirely left out of the training and validation setsvalidation_objects.txt
: Objects listed here will be entirely left out of the training and testing sets
The following txt configuration files are only for object dataset dictionaries which include the ground truth depth image generated from the mesh
real_data/train_real_objects.txt
: Place names of objects inreal_data
here if you want them included in the training set, we found this to assist training even though there are less data points for these objects.real_data/validation_real_objects.txt
: Place names of objects inreal_data
here if you want them included in the validation setreal_data/test_real_objects.txt
: Place names of objects inreal_data
here if you want them included in the test set
IMPORTANT: After downloading this data, update main_config.py
with its absolute path on your system:
DATA_PATH = '/path/to/GelSlim4.0DepthDataset'
You can add your own data from your own GelSlim 4.0! As long as it exists in this same format. To view the paired data in these .pt
files, run view_pt.py
as follows:
python scripts/data_scripts/view_pt.py <sub_dir> <data_name>
For example:
python scripts/data_scripts/view_pt.py test_data pattern_31_rod_test.pt
If you wish to regenerate the ground truth depth images for our dataset or generate your own for new objects with in hand pose-labeled GelSlim 4.0 images, you may run (from the gelslim_depth
root directory):
python scripts/data_scripts/depth_generation.py
This script can be configured directly in the file, including the folder in which to generate the depth images, and the list of objects for which you'd like to generate the depth images. You must have a corresponding STL file in gelslim_depth/mesh
.
If you have data in the form of .pt
files that are not split into train, validation, and test sets, place them in DATA_PATH
(outside of the train_data
, etc. folders) and run split_data.py
as follows:
python scripts/data_scripts/split_data.py <device>
For example:
python scripts/data_scripts/split_data.py cuda:0
This will perform the segmentation of all .pt
files in DATA_PATH
into train, test, and validation folders. This will by default delete the original .pt
file due to the large file sizes. This segmentation will take place on <device>
as recognized by PyTorch.
We provide the U-Net architecture for depth estimation with customizeable layers and hyperparameters in gelslim_depth/gelslim_depth/models/unet.py
. Our dataset class is in gelslim_depth/gelslim_depth/datasets/general_dataset.py
. You'll find that this and other scripts rely on the code within gelslim_depth/gelslim_depth/processing_utils
.
To run a training of the U-Net, run train_unet.py
as follows:
python train_utils/train_unet.py <weights_name> <gpu> --exclude_objects [str: objects to exclude] --activation_func [relu, tanh, mish] --max_datapoints_per_object [int]
With additional arguments:
--train_indefinitely: do not stop the training when a rising validation loss is detected (lowest validation loss is still the default saved checkpoint)
--use_difference_image: use the difference between the deformed and undeformed tactile images as the input to the U-Net (recommended)
For example:
python train_utils/train_unet.py unet_model_1 1 --exclude_objects peg1 peg2 --activation_func relu --max_datapoints_per_object 1000 --train_indefinitely --use_difference_image
Each epoch of training may take a while given the data size and network size, but it will likely converge to a usable network in less than 100 epochs. This script will create and populate the following directories within gelslim_depth/train_output
:
live_display
: The latest (not necessarily best/currently saved) checkpoint's performance visualized on the datasets. This plotting can be disabled by settingnum_images_to_display_live = 0
intrain_unet.py
weights
: The model checkpointsloss_curves
: Live plotted loss curvesloss_values
: The logged output of the training, validation, and test losses
This will also populate gelslim_depth/gelslim_depth/config
with an importable .py
config file unique to the training parameters that were set, this is paired with the weights checkpoint file for testing the resulting model and downstream tasks.
To test a trained model, use test_depth_estimation.py
:
python test_utils/test_depth_estimation.py <weights_name> <gpu> <sub_dir> [object1, object2, ...]
This will visualize using the provided weights and device the estimation using all objects in the list [object1, object2, ...]
that lie within <sub-dir>
. For example:
python test_utils/test_depth_estimation.py unet_bigdata 0 real_data button marble edge
If no objects are given after the <sub_dir>
, all objects within <sub_dir>
will be plotted.
If your require our depth estimation algorithm for your downstream robotic tasks, we provide the following capabilities:
Import the processing functions:
from gelslim_depth.processing_utils.normalization_utils import denormalize_depth_image, normalize_tactile_image
from gelslim_depth.processing_utils.image_utils import sample_multi_channel_image_to_desired_size
Alternatively, import the complete prediction function which includes these:
from gelslim_depth.processing_utils.complete_prediction import predict_depth_from_RGB
This is a function which takes in arguments (images, model, output_size, config)
.
Also import this function, if applicable (it is recommended to use difference images for depth estimation):
from gelslim_depth.processing_utils.image_utils import get_difference_image
- In whatever way you acquire tactile images (ROS, webcam, etc.), first convert the RGB images to a torch tensor
tactile_image
with dimensionsN x 3 x H x W
depending on the size of the images that the trained model has seen. - Do the same for the undeformed tactile image (again, this can be
1 x 3 x H x W
and it should work) if applicable, and save as variablebase_tactile_image
. - Ensure these RGB images have the original
uint8
values from 0 to 255 but the tensor datatype should be float. - Then, the estimation can begin as such, with defined
weights_name
and<device>
:
import gelslim_depth.config.config_<weights_name>.py as config
from gelslim_depth.models.unet import UNet
network = UNet(n_channels=3, n_classes=1, layer_dimensions=config.CNN_dimensions, kernel_size=config.kernel_size, maxpool_size=config.maxpool_size, upconv_stride=config.upconv_stride).to(<device>)
weights = /path/to/checkpoints/<weights_name>.pth
network.load_state_dict(torch.load(weights, map_location=device))
diff_image = get_difference_image(tactile_image, base_tactile_image)
depth_output = predict_depth_from_RGB(diff_image, network, (diff_image.shape[2],diff_image.shape[3]), config)
Alternatively, the weights can be imported as recorded by config, this ensures the two are compatible:
weights_name = <weights_name>
model.load_state_dict(torch.load(config.weights_path+weights_name+'.pth', map_location=device))
unet_bigdata.pth: There are the pretrained weights we provide, which work alongside config_unet_bigdata.py
for the functionality described above. Please move this file into gelslim_depth/train_output/weights
to ensure functionality.