Experiment aiming to compare performance of active learning (AL) querying VS random subset selection. The expected result is that AL performance increases more rapidly. The experiment is carried out on generated Gaussian clusters and on images using Cellpose as feature extractor.
The diagram below summarizes the input/output structure of the experiment, where rectangular nodes are function, and elliptical ones are inputs or outputs.
The gp_al (stands for Gaussian Process Active Learning) algorithm is at the centre of the experiment. It handles the querying process which simulates the input received from the oracle by revealing the information to the classifier in stages. For the active learning, selection is done based on the entropy of predictions. Comparing to non-informative querying is done by passing a querying function which simply selects a random subset of the data.
Change run_container.sh to use your directories. Use docker and the run_container.sh script. Uncomment the jupyter bit if you want to run a notebook on localhost:8008.
For using the fluorescent cells dataset, download and unpack https://zenodo.org/record/6645803 into /data and use the run_container.sh script to mount this directory.