Skip to content
forked from mherde/mapal

Multi-annotator Probabilistic Active Learning

License

Notifications You must be signed in to change notification settings

timosturm/mapal

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-annotator Probabilistic Active Learning

Authors: Marek Herde, Daniel Kottke, Denis Huseljic, and Bernhard Sick

Supplementary Material (supplementary-material.pdf) contains further details on the experimental setup (i.e., information about data sets and annotator simulation techniques) and additional results (i.e., learning curves, a table of area under learning curve values, and a table of run times).

Project Structure

  • data: contains .csv-files of data sets being not available at OpenML
  • plots: directory where the visualizations of MaPAL will be saved
  • results: path where all results will be stored including csvs, learning curves, and ranking statistics
  • src: Python package consisting of several sub-packages
    • base: implementation of DataSet and QueryStrategy class
    • classifier: implementation of Similarity based Classifier (SbC) being an advancement of the Parzen Window Classifier (PWC)
    • evaluation_scripts: scripts for experimental setup
    • notebooks: jupyter notebooks for the investigation of MaPAL, simulation of annotators, and the illustration of results
    • query_strategies: implementation of all query/AL strategies
    • utils: helper functions

How to execute an experiment?

Due to the large number of experiments, we executed the experiments on a computer cluster. This way, we were able to execute 100 experiments simultaneously.

Without such a computer cluster, it will probably take several days to reproduce all results of the article. Nevertheless, one can execute the experiments on a local personal computer by following the upcoming steps.

  1. Setup Python environment:
projectpath$ sudo apt-get install python3-pip
projectpath$ pip3 install virtualenv
projectpath$ virtualenv mapal
projectpath$ source mapal/bin/activate
projectpath$ pip3 install -r requirements.txt
  1. Simulate annotators: Start jupyter-notebook and run the jupyter-notebook file projectpath/src/notebooks/simulate_annotators.ipynb. This must be the first step before executing any experiment.
projectpath$ source mapal/bin/activate
projectpath$ jupyter-notebook
  1. Get information about the available hyperparameters (argparse) for the experiments.
projectpath$ source mapal/bin/activate
projectpath$ export PYTHONPATH="${PYTHONPATH}":$(pwd)/src
projectpath$ python3 src/evaluation_scripts/experimental_setup.py -h
  1. Example experiment: To test MaPAL with M_max=2 and beta_0=0.0001 on the dataset iris with annotators having instance-dependent performance values and with
    • a budget of 40% of all available annotations,
    • a test ratio of 40%,
    • and using the seed 1,

we have to execute the following commands:

projectpath$ source mapal/bin/activate
projectpath$ export PYTHONPATH="${PYTHONPATH}":$(pwd)/src
projectpath$ python3 src/evaluation_scripts/experimental_setup.py \
  --query_strategy mapal-1-0.0001-2-1-entropy \
  --data_set iris-simulated-x \
  --results_path results/simulated-x/csvs \
  --test_ratio 0.4 \
  --budget 0.4 \
  --seed 1

For this example, the results are saved in the directory projectpath/results/simulated-x/csvs/ as a .csv-file.

The names of the possible data sets are given in the following files:

  • projectpath/data/data-set-names-real-world.csv: contains the names of the data sets with real-world annotators (the data set grid is not available because it contains confidential data),
  • projectpath/data/data-set-names-simulated-o.csv: contains the names of the data sets with simulated annotators having uniform performance values,
  • projectpath/data/data-set-names-simulated-y.csv: contains the names of the data sets with simulated annotators having class-dependent performance values,
  • projectpath/data/data-set-names-simulated-x.csv: contains the names of the data sets with simulated annotators having instance-dependent performance values.

To create the ranking statistics, there must be at least one run for each strategy on a data set. The different AL strategies that can be used as --query_strategy argument are given in the following:

  • MaPAL: mapal-1-0.0001-2-1-entropy,
  • IEThresh: ie-thresh,
  • IEAdjCost: ie-adj-cost,
  • CEAL: ceal,
  • ALIO: alio,
  • Proactive: proactive,
  • Random: random.

To conduct the experiments data sets with real-world annotators in accordance to the article, execute the following command:

projectpath$ bash src/evaluation_scripts/evaluate_real-world-local.sh 5

The argument 5 is an example and gives the maximum number of runs that can be executed in parallel. You can change this number.

To conduct the experiments data sets with simulated annotators having uniform performances values in accordance to the article, execute the following command:

projectpath$ bash src/evaluation_scripts/evaluate_simulated-o-local.sh 5

To conduct the experiments data sets with simulated annotators having class-dependent performances values in accordance to the article, execute the following command:

projectpath$ bash src/evaluation_scripts/evaluate_simulated-y-local.sh 5

To conduct the experiments data sets with simulated annotators having instance-dependent performances values in accordance to the article, execute the following command:

projectpath$ bash src/evaluation_scripts/evaluate_simulated-x-local.sh 5

How to illustrate the experimental results?

Start jupyter-notebook and run the jupyter-notebook file projectpath/src/notebooks/experimental_results.ipynb. Remark: The ranking plots can only be created when we have for each dataset and each strategy the same number of executed experiments.

projectpath$ source mapal/bin/activate
projectpath$ jupyter-notebook

How to reproduce the annotation performance and instance utility plots?

Start jupyter-notebook and run the jupyter-notebook file projectpath/src/notebooks/visualization.ipynb.

projectpath$ source mapal/bin/activate
projectpath$ jupyter-notebook

How to reproduce study on hyperparameters?

Run experiments on toy data set by executing the following command.

projectpath$ bash src/evaluation_scripts/evaluate_toy-data.sh 5

The argument 5 is an example and gives the maximum number of runs that can be executed in parallel. You can change this number.

Start jupyter-notebook and run the jupyter-notebook file projectpath/src/notebooks/hyperparameters.ipynb.

projectpath$ source mapal/bin/activate
projectpath$ jupyter-notebook

About

Multi-annotator Probabilistic Active Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.3%
  • Python 5.5%
  • Shell 0.2%