Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR I include the validation stage per languages! There are some comments in the code itself, but the most relevant things to take care of are:
_update_dataloader_based_on_training_stages
for training and_prepare_dataloader_for_validation_stage
for validation). After carefully analysing the mechanism, I have to contact them because it's not deleting the previous DataLoaders from the previous stages properly. Is it something we should be aware of? NO, everything works, but I copied this logic for the validation stage and it's kind of useless & messy.To use this validation stage feature you just need to set
dataset.validation_folder
for each data stage andtokens. val_check_interval
. Be aware that as we are logging the training & validation metrics together, we must set the validation interval a multiple of the logging interval (aka perform the validation stage during a training step in which we are logging the metrics).You can check some wandb logs here. Keep in mind that wand logs each metric separately, so in order to merge in a single plot the different language losses + global loss you need to "edit panel" (✏️) and set in the
*
option "validation_loss". I recommend you trying this feature with a single wandb run instead of the whole project runs.