merge master confclit

marrlab · Dec 13, 2023 · e99f498 · e99f498
2 parents 972195d + 6dc3118
commit e99f498
Show file tree

Hide file tree

Showing 304 changed files with 21,337 additions and 1,450 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -37,4 +37,4 @@ jobs:
     - name: test if examples in markdown works
       run: bash -x -v ci_run_examples.sh
     - name: test if benchmark works
-      run: bash -x -v run_benchmark_standalone.sh examples/benchmark/benchmark_mnist_shared_hyper_grid.yaml
+      run: pip install snakemake && bash -x -v run_benchmark_standalone.sh examples/benchmark/benchmark_mnist_shared_hyper_grid.yaml
diff --git a/MANIFEST.IN b/MANIFEST.IN
@@ -0,0 +1,2 @@
+# include data
+recursive-include zdata *
diff --git a/README.md b/README.md
@@ -1,95 +1,88 @@
-# DomainLab: train robust neural networks using domain generalization algorithms on your data
+# DomainLab: modular python package for training domain invariant neural networks
 
 ![GH Actions CI ](https://github.com/marrlab/DomainLab/actions/workflows/ci.yml/badge.svg?branch=master)
 [![codecov](https://codecov.io/gh/marrlab/DomainLab/branch/master/graph/badge.svg)](https://app.codecov.io/gh/marrlab/DomainLab)
 [![Codacy Badge](https://app.codacy.com/project/badge/Grade/bc22a1f9afb742efb02b87284e04dc86)](https://www.codacy.com/gh/marrlab/DomainLab/dashboard)
 [![Documentation](https://img.shields.io/badge/Documentation-Here)](https://marrlab.github.io/DomainLab/)
 [![pages-build-deployment](https://github.com/marrlab/DomainLab/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/marrlab/DomainLab/actions/workflows/pages/pages-build-deployment)
-## Domain Generalization and DomainLab
 
-Domain Generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains. 
+## Distribution shifts, domain generalization and DomainLab
+
+Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions). 
 
 DomainLab is a software platform with state-of-the-art domain generalization algorithms implemented, designed by maximal decoupling of different software componets thus enhances maximal code reuse.
 
+As an input to the software, the user need to provide 
+- the neural network to be trained for the task (e.g. classification)
+- task specification which contains dataset(s) from domain(s). 
+
+DomainLab decouples the following concepts or objects:
+- neural network: a map from the input data to the feature space and output.
+- model: structural risk in the form of $\ell() + \mu R()$  where $\ell()$ is the task specific empirical loss (e.g. cross entropy for classification task) and $R()$ is the penalty loss for inter-domain alignment (domain invariant regularization).
+- trainer:  an object that guides the data flow to model and append further domain invariant losses.
+
+DomainLab makes it possible to combine models with models, trainers with models, and trainers with trainers in a decorator pattern like `Trainer A(Trainer B(Model C(Model D(network E), network F)))` which correspond to $\ell() + \mu_a R_a() + \mu_b R_b + \mu_c R_c() + \mu_d R_d()$ 
 
 ## Getting started
 
 ### Installation
-#### Create a virtual environment for DomainLab (strongly recommended)
+For development version in Github, see [Installation and Dependencies handling](./docs/doc_intall.md)
 
-`conda create --name domainlab_py39 python=3.9`
+We also offer a PyPI version here https://pypi.org/project/domainlab/  which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it. 
 
-then 
+### Task specification
+In DomainLab, a task is a container for datasets from different domains. See detail in
+[Task Specification](./docs/doc_tasks.md)
 
-`conda activate domainlab_py39`
+### Example and usage
 
-#### Install Development version (recommended)
+#### Either clone this repo and use command line 
+See details in [Command line usage](./docs/doc_usage_cmd.md)
 
-Suppose you have cloned the repository and have changed directory to the cloned repository.
+#### or Programm against DomainLab API
 
-```norun
-pip install -r requirements.txt
+As a user, you need to define neural networks you want to train. As an example, here we define a transformer neural network for classification in the following code. 
 ```
-then 
-
-`python setup.py install`
-
-#### Windows installation details
-
-To install DomainLab on Windows, please remove the `snakemake` and `datrie` dependency from the `requirements.txt` file.
-Benchmarking is currently not supported on Windows due to the dependency on Snakemake.
-One could, however, try install minimal Snakemake via
-`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal`
-
-#### Dependencies management
--   [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.
-
-#### Release
-- Install via `pip install domainlab`
-
-### Basic usage
-DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py)
-
-Suppose you have cloned the repository and have the dependencies ready, change directory to this repository:
-
-To train a domain invariant model on the vlcs_mini task
-
-```shell
-python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml 
+from torch import nn                                                                                     
+from torchvision.models import vit_b_16                                                                  
+from torchvision.models.feature_extraction import create_feature_extractor
+
+class VIT(nn.Module):                                                                                    
+    def __init__(self, num_cls, freeze=True,                                                             
+                 list_str_last_layer=['getitem_5'],                                                      
+                 len_last_layer=768):                                                                    
+        super().__init__()                                                                               
+        self.nets = vit_b_16(pretrained=True)                                                            
+        if freeze:                                                                                                                                    
+            for param in self.nets.parameters():                                                         
+                param.requires_grad = False                                                              
+        self.features_vit_flatten = create_feature_extractor(self.nets, return_nodes=list_str_last_layer)           
+        self.fc = nn.Linear(len_last_layer, num_cls)                                                     
+                                                                                                         
+    def forward(self, tensor_x):                                                                         
+        """
+        compute logits predicts
+        """
+        x = self.features_vit_flatten(tensor_x)['getitem_5']
+        out = self.fc(x)
+        return out
 ```
-where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml)
-
-#### Further usage
-Alternatively, in a verbose mode without using the algorithm configuration file:
-
-```shell
-python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2
+Then we plug this neural network in our model:
 ```
-
-where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss.
-`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains.
-For usage of other arguments, check with
-
-```shell
-python main_out.py --help
+from domainlab.mk_exp import mk_exp                                                                      
+from domainlab.tasks import get_task                                                                     
+from domainlab.models.model_deep_all import mk_deepall
+
+task = get_task("mini_vlcs")
+nn = VIT(num_cls=task.dim_y, freeze=True)
+model = mk_deepall()(nn)
+# use trainer MLDG, DIAL
+exp = mk_exp(task, model, trainer="mldg,dial",   # combine two trainers
+             test_domain="caltech", batchsize=2, nocu=True)
+exp.execute(num_epochs=2)
 ```
 
-See also [Examples](./docs/doc_examples.md).
-
-### Output structure (results storage) and Performance Measure
-[Output structure and Performance Measure](./docs/doc_output.md)
-
-## Custom Usage
-
-To benchmark several algorithms on your dataset, a single line command along with a benchmark configuration files is sufficient. See [Benchmarks](./docs/doc_benchmark.md)
-
-### Define your task
-Do you have your own data that comes from different domains? Create a task for your data and benchmark different domain generlization algorithms according to the following example. See
-[Task Specification](./docs/doc_tasks.md)
-
-### Custom Neural network
-This library decouples the concept of algorithm (model) and neural network architecture where the user could plugin different neural network architectures for the same algorithm. See
-[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md)
 
-## Software Design Pattern, Extend or Contribution
-[Extend or Contibute](./docs/doc_extend_contribute.md)
+### Benchmark different methods
+DomainLab provides a powerful benchmark functionality. 
+To benchmark several algorithms, a single line command along with a benchmark configuration files is sufficient. See details in [Benchmarks](./docs/doc_benchmark.md)
diff --git a/docs/build/html/docHDUVA.md b/docs/build/html/docHDUVA.md
@@ -1,4 +1,5 @@
-# HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
+# Model HDUVA
+## HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
 
 HDUVA builds on a generative approach within the framework of variational autoencoders to facilitate generalization to new domains without supervision. HDUVA learns representations that disentangle domain-specific information from class-label specific information even in complex settings where domain structure is not observed during training. 
 

diff --git a/docs/build/html/doc_benchmark.md b/docs/build/html/doc_benchmark.md
@@ -11,6 +11,24 @@ Within each benchmark, two aspects are considered:
 2. Sensitivity to selected hyperparameters: by sampling hyperparameters randomly,
 the performance with respect to different hyperparameter choices is investigated.
 
+## Dependencies installation
+
+DomainLab relies on `Snakemake` for its benchmark functionality. 
+
+### Unix installation
+
+```
+pip install snakemake
+```
+
+### Windows installation details
+
+Benchmarking is currently not tested on Windows due to the dependency on `Snakemake` and `datrie`
+One could, however, try install minimal Snakemake via
+`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal`
+to see if the following function still works
+
+
 ## Setting up a benchmark
 The benchmark is configured in a yaml file. We refer to [doc_benchmark_yaml.md](https://github.com/marrlab/DomainLab/blob/master/docs/doc_benchmark_yaml.md) for a documented
 example. 

diff --git a/docs/build/html/doc_dann.md b/docs/build/html/doc_dann.md
@@ -0,0 +1,2 @@
+# Model DANN: 
+Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.
diff --git a/docs/build/html/doc_install.md b/docs/build/html/doc_install.md
@@ -0,0 +1,27 @@
+# Installation of DomainLab
+
+## Create a virtual environment for DomainLab (strongly recommended)
+
+`conda create --name domainlab_py39 python=3.9`
+
+then 
+
+`conda activate domainlab_py39`
+
+### Install Development version via github
+
+Suppose you have cloned the repository and have changed directory to the cloned repository.
+
+```norun
+pip install -r requirements.txt
+```
+then 
+
+`python setup.py install`
+
+#### Dependencies management
+-   [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.
+
+###  Install Release
+It is strongly recommended to create a virtual environment first, then 
+- Install via `pip install domainlab`
diff --git a/docs/build/html/doc_usage_cmd.md b/docs/build/html/doc_usage_cmd.md
@@ -0,0 +1,40 @@
+### Basic usage
+DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py)
+
+Suppose you have cloned the repository and have the dependencies ready, change directory to this repository:
+
+To train a domain invariant model on the vlcs_mini task
+
+```shell
+python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml 
+```
+where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml)
+
+#### Further usage
+Alternatively, in a verbose mode without using the algorithm configuration file:
+
+```shell
+python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2
+```
+
+where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss.
+`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains.
+For usage of other arguments, check with
+
+```shell
+python main_out.py --help
+```
+
+See also [Examples](./docs/doc_examples.md).
+
+### Custom Neural network
+
+where the user could plugin different neural network architectures for the same algorithm. See
+[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md)
+
+### Output structure (results storage) and Performance Measure
+[Output structure and Performance Measure](./docs/doc_output.md)
+
+
+## Software Design Pattern, Extend or Contribution
+[Extend or Contibute](./docs/doc_extend_contribute.md)
diff --git a/docs/docHDUVA.md b/docs/docHDUVA.md
@@ -1,4 +1,5 @@
-# HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
+# Model HDUVA
+## HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
 
 HDUVA builds on a generative approach within the framework of variational autoencoders to facilitate generalization to new domains without supervision. HDUVA learns representations that disentangle domain-specific information from class-label specific information even in complex settings where domain structure is not observed during training. 
 

diff --git a/docs/doc_benchmark.md b/docs/doc_benchmark.md
@@ -11,6 +11,24 @@ Within each benchmark, two aspects are considered:
 2. Sensitivity to selected hyperparameters: by sampling hyperparameters randomly,
 the performance with respect to different hyperparameter choices is investigated.
 
+## Dependencies installation
+
+DomainLab relies on `Snakemake` for its benchmark functionality. 
+
+### Unix installation
+
+```
+pip install snakemake
+```
+
+### Windows installation details
+
+Benchmarking is currently not tested on Windows due to the dependency on `Snakemake` and `datrie`
+One could, however, try install minimal Snakemake via
+`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal`
+to see if the following function still works
+
+
 ## Setting up a benchmark
 The benchmark is configured in a yaml file. We refer to [doc_benchmark_yaml.md](https://github.com/marrlab/DomainLab/blob/master/docs/doc_benchmark_yaml.md) for a documented
 example. 

diff --git a/docs/doc_dann.md b/docs/doc_dann.md
@@ -1,2 +1,2 @@
-# Model "dann": 
+# Model DANN: 
 Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.
diff --git a/docs/doc_install.md b/docs/doc_install.md
@@ -0,0 +1,27 @@
+# Installation of DomainLab
+
+## Create a virtual environment for DomainLab (strongly recommended)
+
+`conda create --name domainlab_py39 python=3.9`
+
+then 
+
+`conda activate domainlab_py39`
+
+### Install Development version via github
+
+Suppose you have cloned the repository and have changed directory to the cloned repository.
+
+```norun
+pip install -r requirements.txt
+```
+then 
+
+`python setup.py install`
+
+#### Dependencies management
+-   [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.
+
+###  Install Release
+It is strongly recommended to create a virtual environment first, then 
+- Install via `pip install domainlab`
diff --git a/docs/doc_usage_cmd.md b/docs/doc_usage_cmd.md
@@ -0,0 +1,40 @@
+### Basic usage
+DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py)
+
+Suppose you have cloned the repository and have the dependencies ready, change directory to this repository:
+
+To train a domain invariant model on the vlcs_mini task
+
+```shell
+python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml 
+```
+where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml)
+
+#### Further usage
+Alternatively, in a verbose mode without using the algorithm configuration file:
+
+```shell
+python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2
+```
+
+where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss.
+`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains.
+For usage of other arguments, check with
+
+```shell
+python main_out.py --help
+```
+
+See also [Examples](./docs/doc_examples.md).
+
+### Custom Neural network
+
+where the user could plugin different neural network architectures for the same algorithm. See
+[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md)
+
+### Output structure (results storage) and Performance Measure
+[Output structure and Performance Measure](./docs/doc_output.md)
+
+
+## Software Design Pattern, Extend or Contribution
+[Extend or Contibute](./docs/doc_extend_contribute.md)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Model DANN:
		Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.