Replies: 2 comments 10 replies
-
We have an internal issue tracker, and the oldest issue I opened on it that is still open (before we even opened the source code of refiners) is: "Find a way to avoid changing the state dict when we add blocks without weights" :) So yes this is something we have been discussing a lot internally. We don't really have a perfect solution for this now and there are always workarounds:
But if you have ideas it could be great! Ideally I'd like the solution to also work when we e.g. insert a chain in a model for clarity / easier targeting, and to keep somewhat semantics keys (i.e. not just use for instance ordered keys named 0001 0002 etc, which works but had other issues). |
Beta Was this translation helpful? Give feedback.
-
The big fundamental question is, should the keys of the state_dict be human-readable? |
Beta Was this translation helpful? Give feedback.
-
Hello refiners,
I'm experimenting with trainer and especially I'm facing a problem to load/save models weights
The sequence of the trainer is :
trainer.prepare_models
is loading the checkpoint on a non-injected modelon_train_begin
is injecting the dropout_adapteron_checkpoint_save
is saving the checkpoint (usingmodel.state_dict()
)The named of the Dropout-impacted layers are changed in step 2.
As a result, the model saved in
on_checkpoint_save
are not compatible with the loading intrainer.prepare_models
, and i cannot smootly save/load the model.Toy example
The injection of the dropout adpater is changing the keys of weights in
state_dict()
is outputing
What i'm not clear is what is the target behavior
A. should
.inject(parent)
change the name of the weights and we should fix the save/load sequence in the trainer ?B. should
.inject(parent)
not change the name of the weights instate_dict()
when the adapter is not injecting new weights ?I can help on this if needed
Beta Was this translation helpful? Give feedback.
All reactions