Self-adaptive translation mode for Marian (runtime domain adaptation). #887

rihardsk · 2021-10-27T12:54:21Z

Description

This PR implements self-adaptive translation, a.k.a. runtime domain adaptation, in Marian. It enables training the model on a set of context sentence pairs (source and target) prior to translation to adapt it to a new domain during runtime. The model is reset before the next translation.

This is useful because it enables one to have a single generic NMT model that is fine-tuned on the fly to better suit any number of domains for which context sentences can be provided. Typically these context sentences would be fetched from a translation memory (out of scope of this PR) based on similarity to the to-be-translated sentence at hand. More so, the translation quality can be improved over time without retraining and redeploying the model, by adding additional sentences to the translation memory.

The PR based on earlier work by @snukky in c63aa8f but the mechanism for transferring model parameters from the training graph to translation graph has been revised so that it's based on the swappable infrastructure from kpu#2.

Self-adaptive translation can be run either in server mode, where source sentences and context sentence pairs are supplied via JSON, or in CLI mode, where they're supplied in separate files.

List of changes:

a new marian-adaptive executable
a new CMake flag to enable building the new executable – -DCOMPILE_ADAPTIVE=ON
changes to config_parser.cpp to allow marian-adaptive to accept options for both translation and training
bringing out the code for translating model parameter names from the Namatus or Amun naming convention, so that it can be re-used for model loading in self-adaptive Marian
a fix in corpus_base.cpp to enable stdin handling
fixes for some other small issues encountered along the way

Added dependencies: none

How to test

Run the regression tests located in the tests/_self-adaptive directory in this PR marian-nmt/marian-regression-tests#81

I've tested things on Ubuntu 18.04. To enable building marian-adaptive, you must use cmake .. -DCOMPILE_ADAPTIVE=ON.

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

SwappableSlot: add GPU-to-GPU reset feature

…into dynamic_swap_mvp

This doesn't work though because we're missing a lot of options because we initialize them manually instead of using the config parser.

…or options

rihardsk · 2021-12-06T12:47:25Z

@snukky I think I've implemented all the requested changes. I've also added a test using a transformer model to marian-nmt/marian-regression-tests#81

CI is failing but that seems to note be related to my changes.

snukky · 2021-12-06T13:19:48Z

@rihardsk Thanks, I will take a look again. The GitHub check called "Documentation" is optional and will not make the CI failing, the rest should pass (excluding the already disabled "Ubuntu 16.04"). Did you run all regression tests from marian-regression-tests locally?

src/command/marian_adaptive_server.cpp

src/common/config_parser.cpp

src/graph/parameters.h

Co-authored-by: Roman Grundkiewicz <[email protected]>

Wasn't intentional

…rian

rihardsk · 2021-12-17T11:50:00Z

@snukky i had run the regression tests previously but reran them again. This time there were some 40 failing tests. I checked some of the logs and it seems that the outputs are off by some small fraction but i didn't check all. Here's the summary:

 Skipped:
  - tests/examples/mnist/test_mnist_ffnn.sh
  - tests/examples/unit-tests/test_unit_tests.sh
  - tests/examples/iris/test_iris.sh
  - tests/training/restoring/optimizer/test_adam_params_async.sh
  - tests/training/restoring/optimizer/test_adam_params_sync.sh
  - tests/training/restoring/exp-smoothing/test_expsmooth_sync.sh
  - tests/training/restoring/multi-gpu/test_adam_sync.sh
  - tests/training/restoring/multi-gpu/test_async.sh
  - tests/training/restoring/multi-gpu/test_sync.sh
  - tests/training/features/exp-smoothing/test_expsmooth_sync.sh
  - tests/training/multi-gpu/test_async_sgd_runs.sh
  - tests/training/multi-gpu/test_sync_sgd.sh
  - tests/models/wngt19/test_model_base_fbgemm_packed16.sh
  - tests/models/wngt19/test_model_base_fbgemm_packed8.sh
Failed:
  - tests/decoder/wmt16/test_nbest.sh
  - tests/decoder/wmt17/test_nbest.sh
  - tests/decoder/shortlist/test_shortlist_rnn_gpu.sh
  - tests/decoder/shortlist/test_shortlist_server.sh
  - tests/decoder/word-scores/test_word_scores_batch.sh
  - tests/decoder/align/test_align_nbest.sh
  - tests/decoder/align/test_soft_align.sh
  - tests/decoder/align/test_soft_align_nbest.sh
  - tests/decoder/intgemm/test_intgemm_16bit.sh
  - tests/decoder/intgemm/test_intgemm_16bit_avx2.sh
  - tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
  - tests/decoder/intgemm/test_intgemm_8bit.sh
  - tests/decoder/intgemm/test_intgemm_8bit_avx2.sh
  - tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
  - tests/scorer/nbest/test_custom_feature_name.sh
  - tests/scorer/nbest/test_score_nbest_list.sh
  - tests/scorer/align/test_scorer_align_nbest.sh
  - tests/scorer/lm/test_lm_scores.sh
  - tests/scorer/scores/test_compare_with_decoder_scores.sh
  - tests/scorer/scores/test_scores.sh
  - tests/scorer/scores/test_scores_normalized.sh
  - tests/scorer/scores/test_word_scores.sh
  - tests/scorer/scores/test_word_scores_mini_batch_1.sh
  - tests/scorer/scores/test_word_scores_nbest.sh
  - tests/scorer/scores/test_word_scores_normalized.sh
  - tests/factors/test_factors_decoder.sh
  - tests/factors/test_factors_decoder_concat.sh
  - tests/training/validation/test_compare_decoding_with_transscript_output.sh
  - tests/training/restoring/validation/test_adding_validator_for_finetuning.sh
  - tests/training/restoring/validation/test_restoring_newbest_validators.sh
  - tests/training/restoring/validation/test_valid_reset_stalled.sh
  - tests/training/restoring/optimizer/test_adam_params.sh
  - tests/training/features/quantized-model/test_quant_centers.sh
  - tests/training/features/quantized-model/test_quantmodel.sh
  - tests/training/features/quantized-model/test_quantmodel_log.sh
  - tests/training/features/quantized-model/test_quantmodel_with_bias.sh
  - tests/models/ape/test_nbest.sh
  - tests/models/transformer/test_nbest.sh
  - tests/models/transformer/test_soft_aligns.sh
  - tests/interface/input/test_score_with_blank_lines.sh
  - tests/interface/input-tsv/test_tsv_score.sh
  - tests/interface/input-tsv/test_tsv_score_assume_stdin.sh
  - tests/interface/input-tsv/test_tsv_score_assume_tsv.sh
  - tests/interface/input-tsv/test_tsv_score_dual_source.sh
  - tests/interface/input-tsv/test_tsv_score_lm.sh
  - tests/interface/input-tsv/test_tsv_score_lm_stdin.sh
  - tests/interface/input-tsv/test_tsv_score_stdin.sh
  - tests/interface/input-tsv/test_tsv_train_with_align.sh
  - tests/interface/input-tsv/test_tsv_train_with_align_pos0.sh
Timed out:
  - tests/decoder/wmt16/test_ende_cpu.sh
  - tests/training/restoring/multi-gpu/test_adam_sync_cpu.sh
  - tests/interface/input-tsv/test_tsv_train_mini_batch_fit_stdin.sh
  - tests/interface/input-tsv/test_tsv_train_stdin_2_epochs.sh
  - tests/interface/input-tsv/test_tsv_train_with_align_stdin.sh

I think i've resolved all of your other suggestions.

…aph-recreate

…atch-1 Change "training-sets" to "train-sets"

snukky · 2022-03-22T10:26:08Z

During the last Marian meeting, we decided that @emjotde will provide comments on src/translator/swappable.h if it can be done in an easier way and review changes in src/graph/*. One our suggestion was that the new feature could be separated from the rest of the code by moving it to its own subdirectory instead being scattered over the directories.

stanBienaives · 2022-07-13T14:55:26Z

@rihardsk Thank you for this work. It is really helpful for what we are trying to achieve.

I have a question though. How could I control the intensity of the adaptation ? I need to control globally how much the model "adapts" to the given context.
Basically:

High intensity => The models adapt "a lot" during runtime
Low value => The model is not adapting very much during runtime

I found this option: data-weighting but I am not sure how it will behave during runtime domain adaptation ?
I your opinion will it have the desired effect if I assign the same weights to every sentence in the context ?

Thanks a lot for your answer

PS: I am not sure if it is the right place to ask this. Feel free to point what would be suitable.

rihardsk · 2022-07-14T07:11:52Z

@stanBienaives you should be able to use the regular training options for that – use, e.g., --after-epochs to set the amount of times that the training must go through a given set of context sentences. --after-batches and --after should also work, afaik.

If you're interested in what other options are available in self-adaptive training, take a look here https://github.com/marian-cef/marian-dev/blob/a274dfbe0f356294ee092315ebd9a9df4dd16c5e/src/common/config_parser.cpp#L423 or just see the help output of the self-adaptive executable. There might still be some less used options that haven't been tested and they might not work as expected but for the most part those options should be working.

BTW, I'm no longer working on this pull request and am not concerned with getting this merged, because I've changed employers recently. Hopefully though, someone will step in to get this over the line because it seemed that very little remained to be done.

XapaJIaMnu and others added 30 commits March 24, 2021 09:13

Dynamic swap working, as long as the vocabularies are the same

0eefedf

Model and GPUSlot separation, add vocab support

521f634

Add vocabulary padding script

67190db

Split code into main and library h/cpp

b165af8

Restore ensemble support

4d8e327

Minor logging improvements

203a9bb

Return Histories

c71d488

Alignments

47feb2b

Fix enit

8fc8d02

Merge github.com:marian-nmt/marian-dev into dynamic_swap_mvp

b4bded3

Merge https://github.com/kpu/marian-dev into dynamic_swap_mvp

b9bc153

Add an option to force loading

9b3e76a

Allow CPU only compilation

cf12178

Add explicit gpu device index when creating the object

7e06801

Allow multiple mini-batches

635cfb0

No stringstreams

ee6ff75

Sort the histories before returning them

57ddeba

SwappableSlot: add GPU-to-GPU reset feature

4f2b218

Merge pull request #1 from davidecaroselli/dynamic_swap_mvp

fa51460

SwappableSlot: add GPU-to-GPU reset feature

Merge branch 'dynamic_swap_mvp' of https://github.com/kpu/marian-dev …

2062438

…into dynamic_swap_mvp

Separate graph from loading to GPU

e3f5388

Abort if not initialized

ba4d166

Go back to Load instead of OverwriteFrom

f8523b7

Check device index

8bcfdcc

Start working on code to reproduce a bug i encountered

a893f19

Build the model implement a simplistic training loop

7f6d01e

This doesn't work though because we're missing a lot of options because we initialize them manually instead of using the config parser.

Load config using the cli parser so that we can have default values f…

f4e227e

…or options

Add dummy values for training sets in the config

dcb7122

Repeat the graph initialization in a cycle

fcb9a61

Add a part of the self adaptive marian's implementation

6560067

rihardsk and others added 5 commits December 3, 2021 15:11

Document the dropF0prefix flag

6955a9a

Enable option validation for adaptive marian

20cde20

Add usage instructions to the adaptive/client_example.py script

bbe5196

Mention the tutorial repo as well

85d831f

Add punctiation for clarity

7bb887a

snukky reviewed Dec 6, 2021

View reviewed changes

rihardsk and others added 10 commits December 9, 2021 11:37

Fix a typo in a comment

9f03070

Co-authored-by: Roman Grundkiewicz <[email protected]>

Fix a typo in a comment

96615e7

Co-authored-by: Roman Grundkiewicz <[email protected]>

Fix a typo in a comment

d4a77ba

Co-authored-by: Roman Grundkiewicz <[email protected]>

Revert an added space

379418b

Wasn't intentional

Clarify the server mode handling in ConfigParser

4bb6f5c

Remove TSV options from self-adaptive translation

c41a56b

Share code between marian-server and marian-adaptive-server

6c97f82

Don't require a "models" option for self-adaptive translation

88308a7

Fix crashes introduced by removing some options from self-adaptive ma…

08d20d5

…rian

Disable parallel data validation for self-adaptive server mode

1326bb1

rihardsk and others added 7 commits December 28, 2021 14:06

Introduce a separate workspace size option for the translation graph

56cfb37

Fix alignment printing during translation

d9cddf4

Merge remote-tracking branch 'upstream/master' into adaptive-whole-gr…

892fed4

…aph-recreate

Change "training-sets" to "train-sets"

3359bb7

Merge pull request #9 from marian-cef/adaptive-whole-graph-recreate-p…

22230dd

…atch-1 Change "training-sets" to "train-sets"

Merge branch 'master' into adaptive-whole-graph-recreate

ea169a4

Mention marian-adaptive-server in the changelog

a274dfb

snukky requested a review from emjotde March 22, 2022 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-adaptive translation mode for Marian (runtime domain adaptation). #887

Self-adaptive translation mode for Marian (runtime domain adaptation). #887

rihardsk commented Oct 27, 2021 •

edited

Loading

rihardsk commented Dec 6, 2021

snukky commented Dec 6, 2021

rihardsk commented Dec 17, 2021

snukky commented Mar 22, 2022

stanBienaives commented Jul 13, 2022

rihardsk commented Jul 14, 2022 •

edited

Loading

Self-adaptive translation mode for Marian (runtime domain adaptation). #887

Are you sure you want to change the base?

Self-adaptive translation mode for Marian (runtime domain adaptation). #887

Conversation

rihardsk commented Oct 27, 2021 • edited Loading

Description

How to test

Checklist

rihardsk commented Dec 6, 2021

snukky commented Dec 6, 2021

rihardsk commented Dec 17, 2021

snukky commented Mar 22, 2022

stanBienaives commented Jul 13, 2022

rihardsk commented Jul 14, 2022 • edited Loading

rihardsk commented Oct 27, 2021 •

edited

Loading

rihardsk commented Jul 14, 2022 •

edited

Loading