-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
304 changed files
with
21,337 additions
and
1,450 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# include data | ||
recursive-include zdata * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,95 +1,88 @@ | ||
# DomainLab: train robust neural networks using domain generalization algorithms on your data | ||
# DomainLab: modular python package for training domain invariant neural networks | ||
|
||
![GH Actions CI ](https://github.com/marrlab/DomainLab/actions/workflows/ci.yml/badge.svg?branch=master) | ||
[![codecov](https://codecov.io/gh/marrlab/DomainLab/branch/master/graph/badge.svg)](https://app.codecov.io/gh/marrlab/DomainLab) | ||
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/bc22a1f9afb742efb02b87284e04dc86)](https://www.codacy.com/gh/marrlab/DomainLab/dashboard) | ||
[![Documentation](https://img.shields.io/badge/Documentation-Here)](https://marrlab.github.io/DomainLab/) | ||
[![pages-build-deployment](https://github.com/marrlab/DomainLab/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/marrlab/DomainLab/actions/workflows/pages/pages-build-deployment) | ||
## Domain Generalization and DomainLab | ||
|
||
Domain Generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains. | ||
## Distribution shifts, domain generalization and DomainLab | ||
|
||
Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions). | ||
|
||
DomainLab is a software platform with state-of-the-art domain generalization algorithms implemented, designed by maximal decoupling of different software componets thus enhances maximal code reuse. | ||
|
||
As an input to the software, the user need to provide | ||
- the neural network to be trained for the task (e.g. classification) | ||
- task specification which contains dataset(s) from domain(s). | ||
|
||
DomainLab decouples the following concepts or objects: | ||
- neural network: a map from the input data to the feature space and output. | ||
- model: structural risk in the form of $\ell() + \mu R()$ where $\ell()$ is the task specific empirical loss (e.g. cross entropy for classification task) and $R()$ is the penalty loss for inter-domain alignment (domain invariant regularization). | ||
- trainer: an object that guides the data flow to model and append further domain invariant losses. | ||
|
||
DomainLab makes it possible to combine models with models, trainers with models, and trainers with trainers in a decorator pattern like `Trainer A(Trainer B(Model C(Model D(network E), network F)))` which correspond to $\ell() + \mu_a R_a() + \mu_b R_b + \mu_c R_c() + \mu_d R_d()$ | ||
|
||
## Getting started | ||
|
||
### Installation | ||
#### Create a virtual environment for DomainLab (strongly recommended) | ||
For development version in Github, see [Installation and Dependencies handling](./docs/doc_intall.md) | ||
|
||
`conda create --name domainlab_py39 python=3.9` | ||
We also offer a PyPI version here https://pypi.org/project/domainlab/ which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it. | ||
|
||
then | ||
### Task specification | ||
In DomainLab, a task is a container for datasets from different domains. See detail in | ||
[Task Specification](./docs/doc_tasks.md) | ||
|
||
`conda activate domainlab_py39` | ||
### Example and usage | ||
|
||
#### Install Development version (recommended) | ||
#### Either clone this repo and use command line | ||
See details in [Command line usage](./docs/doc_usage_cmd.md) | ||
|
||
Suppose you have cloned the repository and have changed directory to the cloned repository. | ||
#### or Programm against DomainLab API | ||
|
||
```norun | ||
pip install -r requirements.txt | ||
As a user, you need to define neural networks you want to train. As an example, here we define a transformer neural network for classification in the following code. | ||
``` | ||
then | ||
|
||
`python setup.py install` | ||
|
||
#### Windows installation details | ||
|
||
To install DomainLab on Windows, please remove the `snakemake` and `datrie` dependency from the `requirements.txt` file. | ||
Benchmarking is currently not supported on Windows due to the dependency on Snakemake. | ||
One could, however, try install minimal Snakemake via | ||
`mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal` | ||
|
||
#### Dependencies management | ||
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository. | ||
|
||
#### Release | ||
- Install via `pip install domainlab` | ||
|
||
### Basic usage | ||
DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py) | ||
|
||
Suppose you have cloned the repository and have the dependencies ready, change directory to this repository: | ||
|
||
To train a domain invariant model on the vlcs_mini task | ||
|
||
```shell | ||
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml | ||
from torch import nn | ||
from torchvision.models import vit_b_16 | ||
from torchvision.models.feature_extraction import create_feature_extractor | ||
class VIT(nn.Module): | ||
def __init__(self, num_cls, freeze=True, | ||
list_str_last_layer=['getitem_5'], | ||
len_last_layer=768): | ||
super().__init__() | ||
self.nets = vit_b_16(pretrained=True) | ||
if freeze: | ||
for param in self.nets.parameters(): | ||
param.requires_grad = False | ||
self.features_vit_flatten = create_feature_extractor(self.nets, return_nodes=list_str_last_layer) | ||
self.fc = nn.Linear(len_last_layer, num_cls) | ||
def forward(self, tensor_x): | ||
""" | ||
compute logits predicts | ||
""" | ||
x = self.features_vit_flatten(tensor_x)['getitem_5'] | ||
out = self.fc(x) | ||
return out | ||
``` | ||
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml) | ||
|
||
#### Further usage | ||
Alternatively, in a verbose mode without using the algorithm configuration file: | ||
|
||
```shell | ||
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2 | ||
Then we plug this neural network in our model: | ||
``` | ||
|
||
where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss. | ||
`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains. | ||
For usage of other arguments, check with | ||
|
||
```shell | ||
python main_out.py --help | ||
from domainlab.mk_exp import mk_exp | ||
from domainlab.tasks import get_task | ||
from domainlab.models.model_deep_all import mk_deepall | ||
task = get_task("mini_vlcs") | ||
nn = VIT(num_cls=task.dim_y, freeze=True) | ||
model = mk_deepall()(nn) | ||
# use trainer MLDG, DIAL | ||
exp = mk_exp(task, model, trainer="mldg,dial", # combine two trainers | ||
test_domain="caltech", batchsize=2, nocu=True) | ||
exp.execute(num_epochs=2) | ||
``` | ||
|
||
See also [Examples](./docs/doc_examples.md). | ||
|
||
### Output structure (results storage) and Performance Measure | ||
[Output structure and Performance Measure](./docs/doc_output.md) | ||
|
||
## Custom Usage | ||
|
||
To benchmark several algorithms on your dataset, a single line command along with a benchmark configuration files is sufficient. See [Benchmarks](./docs/doc_benchmark.md) | ||
|
||
### Define your task | ||
Do you have your own data that comes from different domains? Create a task for your data and benchmark different domain generlization algorithms according to the following example. See | ||
[Task Specification](./docs/doc_tasks.md) | ||
|
||
### Custom Neural network | ||
This library decouples the concept of algorithm (model) and neural network architecture where the user could plugin different neural network architectures for the same algorithm. See | ||
[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md) | ||
|
||
## Software Design Pattern, Extend or Contribution | ||
[Extend or Contibute](./docs/doc_extend_contribute.md) | ||
### Benchmark different methods | ||
DomainLab provides a powerful benchmark functionality. | ||
To benchmark several algorithms, a single line command along with a benchmark configuration files is sufficient. See details in [Benchmarks](./docs/doc_benchmark.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Model DANN: | ||
Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Installation of DomainLab | ||
|
||
## Create a virtual environment for DomainLab (strongly recommended) | ||
|
||
`conda create --name domainlab_py39 python=3.9` | ||
|
||
then | ||
|
||
`conda activate domainlab_py39` | ||
|
||
### Install Development version via github | ||
|
||
Suppose you have cloned the repository and have changed directory to the cloned repository. | ||
|
||
```norun | ||
pip install -r requirements.txt | ||
``` | ||
then | ||
|
||
`python setup.py install` | ||
|
||
#### Dependencies management | ||
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository. | ||
|
||
### Install Release | ||
It is strongly recommended to create a virtual environment first, then | ||
- Install via `pip install domainlab` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
### Basic usage | ||
DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py) | ||
|
||
Suppose you have cloned the repository and have the dependencies ready, change directory to this repository: | ||
|
||
To train a domain invariant model on the vlcs_mini task | ||
|
||
```shell | ||
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml | ||
``` | ||
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml) | ||
|
||
#### Further usage | ||
Alternatively, in a verbose mode without using the algorithm configuration file: | ||
|
||
```shell | ||
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2 | ||
``` | ||
|
||
where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss. | ||
`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains. | ||
For usage of other arguments, check with | ||
|
||
```shell | ||
python main_out.py --help | ||
``` | ||
|
||
See also [Examples](./docs/doc_examples.md). | ||
|
||
### Custom Neural network | ||
|
||
where the user could plugin different neural network architectures for the same algorithm. See | ||
[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md) | ||
|
||
### Output structure (results storage) and Performance Measure | ||
[Output structure and Performance Measure](./docs/doc_output.md) | ||
|
||
|
||
## Software Design Pattern, Extend or Contribution | ||
[Extend or Contibute](./docs/doc_extend_contribute.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
# Model "dann": | ||
# Model DANN: | ||
Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Installation of DomainLab | ||
|
||
## Create a virtual environment for DomainLab (strongly recommended) | ||
|
||
`conda create --name domainlab_py39 python=3.9` | ||
|
||
then | ||
|
||
`conda activate domainlab_py39` | ||
|
||
### Install Development version via github | ||
|
||
Suppose you have cloned the repository and have changed directory to the cloned repository. | ||
|
||
```norun | ||
pip install -r requirements.txt | ||
``` | ||
then | ||
|
||
`python setup.py install` | ||
|
||
#### Dependencies management | ||
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository. | ||
|
||
### Install Release | ||
It is strongly recommended to create a virtual environment first, then | ||
- Install via `pip install domainlab` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
### Basic usage | ||
DomainLab comes with some minimal toy-dataset to test its basis functionality, see [A minimal subsample of the VLCS dataset](./data/vlcs_mini) and [an example configuration file for vlcs_mini](./examples/tasks/task_vlcs.py) | ||
|
||
Suppose you have cloned the repository and have the dependencies ready, change directory to this repository: | ||
|
||
To train a domain invariant model on the vlcs_mini task | ||
|
||
```shell | ||
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml | ||
``` | ||
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](./examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](./examples/yaml/demo_config_single_run_diva.yaml) | ||
|
||
#### Further usage | ||
Alternatively, in a verbose mode without using the algorithm configuration file: | ||
|
||
```shell | ||
python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --debug --bs=2 --aname=diva --gamma_y=7e5 --gamma_d=1e5 --nname=alexnet --nname_dom=conv_bn_pool_2 | ||
``` | ||
|
||
where `--aname` specifies which algorithm to use, see [Available Algorithms](./docs/doc_algos.md), `--bs` specifies the batch size, `--debug` restrain only running for 2 epochs and save results with prefix 'debug'. For DIVA, the hyper-parameters include `--gamma_y=7e5` which is the relative weight of ERM loss compared to ELBO loss, and `--gamma_d=1e5`, which is the relative weight of domain classification loss compared to ELBO loss. | ||
`--nname` is to specify which neural network to use for feature extraction for classification, `--nname_dom` is to specify which neural network to use for feature extraction of domains. | ||
For usage of other arguments, check with | ||
|
||
```shell | ||
python main_out.py --help | ||
``` | ||
|
||
See also [Examples](./docs/doc_examples.md). | ||
|
||
### Custom Neural network | ||
|
||
where the user could plugin different neural network architectures for the same algorithm. See | ||
[Specify Custom Neural Networks for an algorithm](./docs/doc_custom_nn.md) | ||
|
||
### Output structure (results storage) and Performance Measure | ||
[Output structure and Performance Measure](./docs/doc_output.md) | ||
|
||
|
||
## Software Design Pattern, Extend or Contribution | ||
[Extend or Contibute](./docs/doc_extend_contribute.md) |
Oops, something went wrong.