In this pipeline we created synthetic brightfield images of yeast cells and trained a Mask R-CNN model on them. Then we used the trained network on real time-series brightfield microscopy data to automaticly segment and track budding yeast cells.
- MSc Herbert Teun Kruitbosch, data scientist, University of Groningen, Data science team
- MSc Yasmin Mzayek, IT trainee, University of Groningen, Data Science trainee
- MA Sara Omlor, IT trainee, University of Groningen, Data Science trainee
- MSc Paolo Guerra, PhD student, University of Groningen, Faculty of Science and Engineering, Molecular Systems Biology
- Dr Andreas Milias Argeitis, principal investigator, University of Groningen, Faculty of Science and Engineering, Molecular Systems Biology
Goals
- To create synthetic image data to train a deep convolutional neural network
- To implement an automatic segmentation pipeline using this network
- To track cells across time frames
We've tried to make our experiments outsider accessible, particularly by setting up the installation for detectron2
in Google Colab and by downloading all external resources when needed. Please note that in these notebooks the first cells install all dependencies, this should work without restarting. However, try restarting via the Colab Runtime menu on errors, since inappropriate versions might have been imported into the runtime before the appropriate ones were installed. Particularly the Train model on synthetic data
might require this. For the other notebooks not restarting will unlikely cause issues.
- Example cell detection (several minutes)
- Evaluation of our Mask R-CNN model against YeaZ and YeastNet2
- Hyperparameter tuning for Mask R-CNN segmentation and tracking (~ 30-200 minutes)
The two notebooks below allow you to create synthetic data and train a model. For a proof of concept, respectively set the sets
and max_iter
parameters to the lower values suggested. If you want to run them for a realistic use-case, please know these scripts take several hours to complete, and Google Colab is not intended for this. The results are large (~0.5 - 2GB) and on Colab you might easily fail to safe guard them when Google Colab shuts down the machine due to inactivity.
For creating the synthetic data set and training the network see the notebooks create_synthetic_dataset_for_training and train_mask_rcnn_network.
For segmentation and tracking on real data see example pipeline notebook.
All the notebooks can be run on Google Colab and automatically install and download all needed dependencies and data (see links above).
(To run the Mask-RCNN locally, you will need to install the Detecron2 library. For a guide to a Window's installation see these instructions. You also need to download the trained model file from https://datascience.web.rug.nl/models/yeast-cells/mask-rcnn/v1/model_final.pth)
-
Input Brightfield time-lapse images. The source file is either a tiff stack or multiple tiff files forming the time-series.
-
Output A dataframe with one row for each detection and
# detections
Xheight
Xwidth
numpy.ndarray
with the boolean segmentation masks, the masks and the dataframe have the same length and themask
column refers to the first dimension of the masks array. The dataframe also has columnsframe
,x
andy
to mark the frame of the source image and the centroid of the detection.
-
Input Besides the dataframe and masks from segmentation, tracking needs hyperparameters for the DBSCAN clustering and the maximum frame distance when determining the distances between detections. You can set a maximum frame distance of
<dmax>
for the algorithm to use to calculate the distances between detections in the currentframe
and bothframe-dmax
,frame+dmax
. In other words, this will calculate distances between all instances in a current frame and all the instances in the following and previous frames up todmax
. A higherdmax
could control for intermittent false negatives because if a cell is missed in an andjacent frame but picked up again 2 frames ahead, the cell will be tracked. However, this also increases the probability of misclassification due to cell growth and movement with time if you look ahead too far. Themin_samples
andeps
variables are required arguments for the DBSCAN algorithm. For further explanation see sklearn.cluster.DBSCAN. -
Output The cell column is added to the dataframe of detections, which is -1 if the tracking algorithm marked it as an outlier and hence didn't track it.
Segmented and tracked yeast cells from Mask R-CNN. The frame rate of these time-series images is 180 seconds. |
You can visualize the segmentations and tracks in a movie using visualize.create_scene
and visualize.show_animation
. Further, you can use visualize.select_cell
to select a particular cell by label and zoom in on it to observe it better in the movie. The movie displayed with default options gives each cell a unique color that stays the same throughout the movie if the cell is tracked correctly. You also have the options to display the label number by setting the parameter labelnum
to True
.
Information and feature extraction
This pipeline allows you to extract information about the detected yeast cells in the time-series. The features.extract_contours
function gives the contour points [x,y] for each segmentation. The masks for all detections can be extracted and their areas can be caulculated as shown in the example pipeline notebook.
A mother/daughter pair of masks are overlayed on the original brightfield image. |
Further, if a flourescent channel is available, the pixel intensity of within each cell can also be calculated using the masks segmented on the brightfield images.
Example of Mask R-CNN pipeline output. |
We evaluated our pipeline using benchmark data from the Yeast Image Toolkit (YIT) (Versari et al., 2017). On this platform, several exisiting pipelines have been evaluated for their segmentation and tracking performance. We tested our pipeline and that of YeaZ (Dietler et al., 2020) and YeastNet2 (Salem et al., 2021) on several test sets from this platform.
We chose to compare our pipeline with YeaZ and YeastNet2 because they also use a deep learning CNN, unlike the other pipelines evaluated on YIT.
The YeaZ segmentation and tracking implementation is based on YeaZ-GUI with optimized parameters obtained in this notebook. Additionally, our implementation allows for the use of GPU for the YeaZ pipeline.
The YeastNet2 segmentation and tracking were implemented using the YeaZ-GUI.
We matched the centroids provided in the benchmark ground truth data to the mask outputs for each model. This is slightly different than the way it was done on the evaluation platform of YIT but comparable since they matched centroids of the prediction to the centroids of the ground truth using a maximum distance threshold to count a comparison as a true positive (see their EP for more detail). We then calculated precision, recall, accuracy, and the F1-score.
In the table below, we report the performance metrics for each test set for both YeaZ and our pipeline for comparison.
Segmentation evaluation results from 7 test sets from the YIT. Precision, recall, accuracy, and the F1-score of the performance of our pipeline, YeaZ, and YeastNet2 are reported. |
Tracking evaluation results from 7 test sets from the YIT. Precision, recall, accuracy, and the F1-score of the performance of our pipeline, YeaZ, and YeastNet2 are reported. |
We further quantitatively evaluated our segmentation accuracy based on IOU and compared it to YeaZ using publicly available annotated ground truth data from the YeaZ group.
Average IOU is calculated for true positives using annotated brightfield images of wild-type cells from the YeaZ dataset |
For our pipeline, we used calibration curves to set the segmentation threshold score needed by the Mask R-CNN to define the probablity that an instance is a yeast cell. For tracking, we used them to tune the epsilon
of DBSCAN and dmax
, the maximum amount of frames between two detections allowed to adjacently track them as the same cell.
TP: true positive detections FP: false positive detections FN: false negatives |
YIT Test set 1 |
YIT Test set 2 |
|||||
YIT Test set 3 |
YIT Test set 4 |
|||||
YIT Test set 5 |
YIT Test set 6 |
|||||
YIT Test set 7 |
||||||
Calibration curves for tracking performance and hyperparameter tuning. |