Skip to content

Commit

Permalink
merge master confclit
Browse files Browse the repository at this point in the history
  • Loading branch information
smilesun committed Dec 13, 2023
2 parents 972195d + 6dc3118 commit e99f498
Show file tree
Hide file tree
Showing 304 changed files with 21,337 additions and 1,450 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ jobs:
- name: test if examples in markdown works
run: bash -x -v ci_run_examples.sh
- name: test if benchmark works
run: bash -x -v run_benchmark_standalone.sh examples/benchmark/benchmark_mnist_shared_hyper_grid.yaml
run: pip install snakemake && bash -x -v run_benchmark_standalone.sh examples/benchmark/benchmark_mnist_shared_hyper_grid.yaml
2 changes: 2 additions & 0 deletions MANIFEST.IN
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# include data
recursive-include zdata *
131 changes: 62 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,88 @@
# DomainLab: train robust neural networks using domain generalization algorithms on your data
# DomainLab: modular python package for training domain invariant neural networks

![GH Actions CI ](https://github.com/marrlab/DomainLab/actions/workflows/ci.yml/badge.svg?branch=master)
[![codecov](https://codecov.io/gh/marrlab/DomainLab/branch/master/graph/badge.svg)](https://app.codecov.io/gh/marrlab/DomainLab)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/bc22a1f9afb742efb02b87284e04dc86)](https://www.codacy.com/gh/marrlab/DomainLab/dashboard)
[![Documentation](https://img.shields.io/badge/Documentation-Here)](https://marrlab.github.io/DomainLab/)
[![pages-build-deployment](https://github.com/marrlab/DomainLab/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/marrlab/DomainLab/actions/workflows/pages/pages-build-deployment)
## Domain Generalization and DomainLab

Domain Generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains.
## Distribution shifts, domain generalization and DomainLab

Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions).

DomainLab is a software platform with state-of-the-art domain generalization algorithms implemented, designed by maximal decoupling of different software componets thus enhances maximal code reuse.

As an input to the software, the user need to provide
- the neural network to be trained for the task (e.g. classification)
- task specification which contains dataset(s) from domain(s).

DomainLab decouples the following concepts or objects:
- neural network: a map from the input data to the feature space and output.
- model: structural risk in the form of $\ell() + \mu R()$ where $\ell()$ is the task specific empirical loss (e.g. cross entropy for classification task) and $R()$ is the penalty loss for inter-domain alignment (domain invariant regularization).
- trainer: an object that guides the data flow to model and append further domain invariant losses.

DomainLab makes it possible to combine models with models, trainers with models, and trainers with trainers in a decorator pattern like `Trainer A(Trainer B(Model C(Model D(network E), network F)))` which correspond to $\ell() + \mu_a R_a() + \mu_b R_b + \mu_c R_c() + \mu_d R_d()$

## Getting started

### Installation
#### Create a virtual environment for DomainLab (strongly recommended)
For development version in Github, see [Installation and Dependencies handling](./docs/doc_intall.md)

`conda create --name domainlab_py39 python=3.9`
We also offer a PyPI version here https://pypi.org/project/domainlab/ which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it.

then
### Task specification
In DomainLab, a task is a container for datasets from different domains. See detail in
[Task Specification](./docs/doc_tasks.md)

`conda activate domainlab_py39`
### Example and usage

#### Install Development version (recommended)
#### Either clone this repo and use command line
See details in [Command line usage](./docs/doc_usage_cmd.md)

Suppose you have cloned the repository and have changed directory to the cloned repository.
#### or Programm against DomainLab API

```norun
pip install -r requirements.txt
As a user, you need to define neural networks you want to train. As an example, here we define a transformer neural network for classification in the following code.
```
then

`python setup.py install`

#### Windows installation details

To install DomainLab on Windows, please remove the `snakemake` and `datrie` dependency from the `requirements.txt` file.
Benchmarking is currently not supported on Windows due to the dependency on Snakemake.
One could, however, try install minimal Snakemake via
`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal`

#### Dependencies management
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.

#### Release
- Install via `pip install domainlab`

### Basic usage
DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py)

Suppose you have cloned the repository and have the dependencies ready, change directory to this repository:

To train a domain invariant model on the vlcs_mini task

```shell
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml
from torch import nn
from torchvision.models import vit_b_16
from torchvision.models.feature_extraction import create_feature_extractor
class VIT(nn.Module):
def __init__(self, num_cls, freeze=True,
list_str_last_layer=['getitem_5'],
len_last_layer=768):
super().__init__()
self.nets = vit_b_16(pretrained=True)
if freeze:
for param in self.nets.parameters():
param.requires_grad = False
self.features_vit_flatten = create_feature_extractor(self.nets, return_nodes=list_str_last_layer)
self.fc = nn.Linear(len_last_layer, num_cls)
def forward(self, tensor_x):
"""
compute logits predicts
"""
x = self.features_vit_flatten(tensor_x)['getitem_5']
out = self.fc(x)
return out
```
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml)

#### Further usage
Alternatively, in a verbose mode without using the algorithm configuration file:

```shell
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2
Then we plug this neural network in our model:
```

where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss.
`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains.
For usage of other arguments, check with

```shell
python main_out.py --help
from domainlab.mk_exp import mk_exp
from domainlab.tasks import get_task
from domainlab.models.model_deep_all import mk_deepall
task = get_task("mini_vlcs")
nn = VIT(num_cls=task.dim_y, freeze=True)
model = mk_deepall()(nn)
# use trainer MLDG, DIAL
exp = mk_exp(task, model, trainer="mldg,dial", # combine two trainers
test_domain="caltech", batchsize=2, nocu=True)
exp.execute(num_epochs=2)
```

See also [Examples](./docs/doc_examples.md).

### Output structure (results storage) and Performance Measure
[Output structure and Performance Measure](./docs/doc_output.md)

## Custom Usage

To benchmark several algorithms on your dataset, a single line command along with a benchmark configuration files is sufficient. See [Benchmarks](./docs/doc_benchmark.md)

### Define your task
Do you have your own data that comes from different domains? Create a task for your data and benchmark different domain generlization algorithms according to the following example. See
[Task Specification](./docs/doc_tasks.md)

### Custom Neural network
This library decouples the concept of algorithm (model) and neural network architecture where the user could plugin different neural network architectures for the same algorithm. See
[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md)

## Software Design Pattern, Extend or Contribution
[Extend or Contibute](./docs/doc_extend_contribute.md)
### Benchmark different methods
DomainLab provides a powerful benchmark functionality.
To benchmark several algorithms, a single line command along with a benchmark configuration files is sufficient. See details in [Benchmarks](./docs/doc_benchmark.md)
3 changes: 2 additions & 1 deletion docs/build/html/docHDUVA.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
# Model HDUVA
## HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION

HDUVA builds on a generative approach within the framework of variational autoencoders to facilitate generalization to new domains without supervision. HDUVA learns representations that disentangle domain-specific information from class-label specific information even in complex settings where domain structure is not observed during training.

Expand Down
18 changes: 18 additions & 0 deletions docs/build/html/doc_benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,24 @@ Within each benchmark, two aspects are considered:
2. Sensitivity to selected hyperparameters: by sampling hyperparameters randomly,
the performance with respect to different hyperparameter choices is investigated.

## Dependencies installation

DomainLab relies on `Snakemake` for its benchmark functionality.

### Unix installation

```
pip install snakemake
```

### Windows installation details

Benchmarking is currently not tested on Windows due to the dependency on `Snakemake` and `datrie`
One could, however, try install minimal Snakemake via
`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal`
to see if the following function still works


## Setting up a benchmark
The benchmark is configured in a yaml file. We refer to [doc_benchmark_yaml.md](https://github.com/marrlab/DomainLab/blob/master/docs/doc_benchmark_yaml.md) for a documented
example.
Expand Down
2 changes: 2 additions & 0 deletions docs/build/html/doc_dann.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Model DANN:
Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.
27 changes: 27 additions & 0 deletions docs/build/html/doc_install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Installation of DomainLab

## Create a virtual environment for DomainLab (strongly recommended)

`conda create --name domainlab_py39 python=3.9`

then

`conda activate domainlab_py39`

### Install Development version via github

Suppose you have cloned the repository and have changed directory to the cloned repository.

```norun
pip install -r requirements.txt
```
then

`python setup.py install`

#### Dependencies management
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.

### Install Release
It is strongly recommended to create a virtual environment first, then
- Install via `pip install domainlab`
40 changes: 40 additions & 0 deletions docs/build/html/doc_usage_cmd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
### Basic usage
DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py)

Suppose you have cloned the repository and have the dependencies ready, change directory to this repository:

To train a domain invariant model on the vlcs_mini task

```shell
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml
```
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml)

#### Further usage
Alternatively, in a verbose mode without using the algorithm configuration file:

```shell
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2
```

where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss.
`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains.
For usage of other arguments, check with

```shell
python main_out.py --help
```

See also [Examples](./docs/doc_examples.md).

### Custom Neural network

where the user could plugin different neural network architectures for the same algorithm. See
[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md)

### Output structure (results storage) and Performance Measure
[Output structure and Performance Measure](./docs/doc_output.md)


## Software Design Pattern, Extend or Contribution
[Extend or Contibute](./docs/doc_extend_contribute.md)
3 changes: 2 additions & 1 deletion docs/docHDUVA.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
# Model HDUVA
## HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION

HDUVA builds on a generative approach within the framework of variational autoencoders to facilitate generalization to new domains without supervision. HDUVA learns representations that disentangle domain-specific information from class-label specific information even in complex settings where domain structure is not observed during training.

Expand Down
18 changes: 18 additions & 0 deletions docs/doc_benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,24 @@ Within each benchmark, two aspects are considered:
2. Sensitivity to selected hyperparameters: by sampling hyperparameters randomly,
the performance with respect to different hyperparameter choices is investigated.

## Dependencies installation

DomainLab relies on `Snakemake` for its benchmark functionality.

### Unix installation

```
pip install snakemake
```

### Windows installation details

Benchmarking is currently not tested on Windows due to the dependency on `Snakemake` and `datrie`
One could, however, try install minimal Snakemake via
`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal`
to see if the following function still works


## Setting up a benchmark
The benchmark is configured in a yaml file. We refer to [doc_benchmark_yaml.md](https://github.com/marrlab/DomainLab/blob/master/docs/doc_benchmark_yaml.md) for a documented
example.
Expand Down
2 changes: 1 addition & 1 deletion docs/doc_dann.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Model "dann":
# Model DANN:
Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.
27 changes: 27 additions & 0 deletions docs/doc_install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Installation of DomainLab

## Create a virtual environment for DomainLab (strongly recommended)

`conda create --name domainlab_py39 python=3.9`

then

`conda activate domainlab_py39`

### Install Development version via github

Suppose you have cloned the repository and have changed directory to the cloned repository.

```norun
pip install -r requirements.txt
```
then

`python setup.py install`

#### Dependencies management
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.

### Install Release
It is strongly recommended to create a virtual environment first, then
- Install via `pip install domainlab`
40 changes: 40 additions & 0 deletions docs/doc_usage_cmd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
### Basic usage
DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py)

Suppose you have cloned the repository and have the dependencies ready, change directory to this repository:

To train a domain invariant model on the vlcs_mini task

```shell
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml
```
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml)

#### Further usage
Alternatively, in a verbose mode without using the algorithm configuration file:

```shell
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2
```

where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss.
`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains.
For usage of other arguments, check with

```shell
python main_out.py --help
```

See also [Examples](./docs/doc_examples.md).

### Custom Neural network

where the user could plugin different neural network architectures for the same algorithm. See
[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md)

### Output structure (results storage) and Performance Measure
[Output structure and Performance Measure](./docs/doc_output.md)


## Software Design Pattern, Extend or Contribution
[Extend or Contibute](./docs/doc_extend_contribute.md)
Loading

0 comments on commit e99f498

Please sign in to comment.