Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-adaptive translation mode for Marian (runtime domain adaptation). #887

Open
wants to merge 143 commits into
base: master
Choose a base branch
from

Conversation

rihardsk
Copy link
Contributor

@rihardsk rihardsk commented Oct 27, 2021

Description

This PR implements self-adaptive translation, a.k.a. runtime domain adaptation, in Marian. It enables training the model on a set of context sentence pairs (source and target) prior to translation to adapt it to a new domain during runtime. The model is reset before the next translation.

This is useful because it enables one to have a single generic NMT model that is fine-tuned on the fly to better suit any number of domains for which context sentences can be provided. Typically these context sentences would be fetched from a translation memory (out of scope of this PR) based on similarity to the to-be-translated sentence at hand. More so, the translation quality can be improved over time without retraining and redeploying the model, by adding additional sentences to the translation memory.

The PR based on earlier work by @snukky in c63aa8f but the mechanism for transferring model parameters from the training graph to translation graph has been revised so that it's based on the swappable infrastructure from kpu#2.

Self-adaptive translation can be run either in server mode, where source sentences and context sentence pairs are supplied via JSON, or in CLI mode, where they're supplied in separate files.

List of changes:

  • a new marian-adaptive executable
  • a new CMake flag to enable building the new executable – -DCOMPILE_ADAPTIVE=ON
  • changes to config_parser.cpp to allow marian-adaptive to accept options for both translation and training
  • bringing out the code for translating model parameter names from the Namatus or Amun naming convention, so that it can be re-used for model loading in self-adaptive Marian
  • a fix in corpus_base.cpp to enable stdin handling
  • fixes for some other small issues encountered along the way

Added dependencies: none

How to test

Run the regression tests located in the tests/_self-adaptive directory in this PR marian-nmt/marian-regression-tests#81

I've tested things on Ubuntu 18.04. To enable building marian-adaptive, you must use cmake .. -DCOMPILE_ADAPTIVE=ON.

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

XapaJIaMnu and others added 30 commits March 24, 2021 09:13
SwappableSlot: add GPU-to-GPU reset feature
This doesn't work though because we're missing a lot of options because
we initialize them manually instead of using the config parser.
@rihardsk
Copy link
Contributor Author

rihardsk commented Dec 6, 2021

@snukky I think I've implemented all the requested changes. I've also added a test using a transformer model to marian-nmt/marian-regression-tests#81

CI is failing but that seems to note be related to my changes.

@snukky
Copy link
Member

snukky commented Dec 6, 2021

@rihardsk Thanks, I will take a look again. The GitHub check called "Documentation" is optional and will not make the CI failing, the rest should pass (excluding the already disabled "Ubuntu 16.04"). Did you run all regression tests from marian-regression-tests locally?

src/command/marian_adaptive_server.cpp Show resolved Hide resolved
src/common/config_parser.cpp Outdated Show resolved Hide resolved
src/common/config_parser.cpp Outdated Show resolved Hide resolved
src/common/config_parser.cpp Outdated Show resolved Hide resolved
src/common/config_parser.cpp Show resolved Hide resolved
src/common/config_parser.cpp Outdated Show resolved Hide resolved
src/graph/parameters.h Outdated Show resolved Hide resolved
@rihardsk
Copy link
Contributor Author

@snukky i had run the regression tests previously but reran them again. This time there were some 40 failing tests. I checked some of the logs and it seems that the outputs are off by some small fraction but i didn't check all. Here's the summary:

 Skipped:
  - tests/examples/mnist/test_mnist_ffnn.sh
  - tests/examples/unit-tests/test_unit_tests.sh
  - tests/examples/iris/test_iris.sh
  - tests/training/restoring/optimizer/test_adam_params_async.sh
  - tests/training/restoring/optimizer/test_adam_params_sync.sh
  - tests/training/restoring/exp-smoothing/test_expsmooth_sync.sh
  - tests/training/restoring/multi-gpu/test_adam_sync.sh
  - tests/training/restoring/multi-gpu/test_async.sh
  - tests/training/restoring/multi-gpu/test_sync.sh
  - tests/training/features/exp-smoothing/test_expsmooth_sync.sh
  - tests/training/multi-gpu/test_async_sgd_runs.sh
  - tests/training/multi-gpu/test_sync_sgd.sh
  - tests/models/wngt19/test_model_base_fbgemm_packed16.sh
  - tests/models/wngt19/test_model_base_fbgemm_packed8.sh
Failed:
  - tests/decoder/wmt16/test_nbest.sh
  - tests/decoder/wmt17/test_nbest.sh
  - tests/decoder/shortlist/test_shortlist_rnn_gpu.sh
  - tests/decoder/shortlist/test_shortlist_server.sh
  - tests/decoder/word-scores/test_word_scores_batch.sh
  - tests/decoder/align/test_align_nbest.sh
  - tests/decoder/align/test_soft_align.sh
  - tests/decoder/align/test_soft_align_nbest.sh
  - tests/decoder/intgemm/test_intgemm_16bit.sh
  - tests/decoder/intgemm/test_intgemm_16bit_avx2.sh
  - tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
  - tests/decoder/intgemm/test_intgemm_8bit.sh
  - tests/decoder/intgemm/test_intgemm_8bit_avx2.sh
  - tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
  - tests/scorer/nbest/test_custom_feature_name.sh
  - tests/scorer/nbest/test_score_nbest_list.sh
  - tests/scorer/align/test_scorer_align_nbest.sh
  - tests/scorer/lm/test_lm_scores.sh
  - tests/scorer/scores/test_compare_with_decoder_scores.sh
  - tests/scorer/scores/test_scores.sh
  - tests/scorer/scores/test_scores_normalized.sh
  - tests/scorer/scores/test_word_scores.sh
  - tests/scorer/scores/test_word_scores_mini_batch_1.sh
  - tests/scorer/scores/test_word_scores_nbest.sh
  - tests/scorer/scores/test_word_scores_normalized.sh
  - tests/factors/test_factors_decoder.sh
  - tests/factors/test_factors_decoder_concat.sh
  - tests/training/validation/test_compare_decoding_with_transscript_output.sh
  - tests/training/restoring/validation/test_adding_validator_for_finetuning.sh
  - tests/training/restoring/validation/test_restoring_newbest_validators.sh
  - tests/training/restoring/validation/test_valid_reset_stalled.sh
  - tests/training/restoring/optimizer/test_adam_params.sh
  - tests/training/features/quantized-model/test_quant_centers.sh
  - tests/training/features/quantized-model/test_quantmodel.sh
  - tests/training/features/quantized-model/test_quantmodel_log.sh
  - tests/training/features/quantized-model/test_quantmodel_with_bias.sh
  - tests/models/ape/test_nbest.sh
  - tests/models/transformer/test_nbest.sh
  - tests/models/transformer/test_soft_aligns.sh
  - tests/interface/input/test_score_with_blank_lines.sh
  - tests/interface/input-tsv/test_tsv_score.sh
  - tests/interface/input-tsv/test_tsv_score_assume_stdin.sh
  - tests/interface/input-tsv/test_tsv_score_assume_tsv.sh
  - tests/interface/input-tsv/test_tsv_score_dual_source.sh
  - tests/interface/input-tsv/test_tsv_score_lm.sh
  - tests/interface/input-tsv/test_tsv_score_lm_stdin.sh
  - tests/interface/input-tsv/test_tsv_score_stdin.sh
  - tests/interface/input-tsv/test_tsv_train_with_align.sh
  - tests/interface/input-tsv/test_tsv_train_with_align_pos0.sh
Timed out:
  - tests/decoder/wmt16/test_ende_cpu.sh
  - tests/training/restoring/multi-gpu/test_adam_sync_cpu.sh
  - tests/interface/input-tsv/test_tsv_train_mini_batch_fit_stdin.sh
  - tests/interface/input-tsv/test_tsv_train_stdin_2_epochs.sh
  - tests/interface/input-tsv/test_tsv_train_with_align_stdin.sh

I think i've resolved all of your other suggestions.

@snukky
Copy link
Member

snukky commented Mar 22, 2022

During the last Marian meeting, we decided that @emjotde will provide comments on src/translator/swappable.h if it can be done in an easier way and review changes in src/graph/*. One our suggestion was that the new feature could be separated from the rest of the code by moving it to its own subdirectory instead being scattered over the directories.

@snukky snukky requested a review from emjotde March 22, 2022 10:26
@stanBienaives
Copy link

@rihardsk Thank you for this work. It is really helpful for what we are trying to achieve.

I have a question though. How could I control the intensity of the adaptation ? I need to control globally how much the model "adapts" to the given context.
Basically:

High intensity => The models adapt "a lot" during runtime
Low value => The model is not adapting very much during runtime

I found this option: data-weighting but I am not sure how it will behave during runtime domain adaptation ?
I your opinion will it have the desired effect if I assign the same weights to every sentence in the context ?

Thanks a lot for your answer

PS: I am not sure if it is the right place to ask this. Feel free to point what would be suitable.

@rihardsk
Copy link
Contributor Author

rihardsk commented Jul 14, 2022

@stanBienaives you should be able to use the regular training options for that – use, e.g., --after-epochs to set the amount of times that the training must go through a given set of context sentences. --after-batches and --after should also work, afaik.

If you're interested in what other options are available in self-adaptive training, take a look here https://github.com/marian-cef/marian-dev/blob/a274dfbe0f356294ee092315ebd9a9df4dd16c5e/src/common/config_parser.cpp#L423 or just see the help output of the self-adaptive executable. There might still be some less used options that haven't been tested and they might not work as expected but for the most part those options should be working.

BTW, I'm no longer working on this pull request and am not concerned with getting this merged, because I've changed employers recently. Hopefully though, someone will step in to get this over the line because it seemed that very little remained to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants