Skip to content

Commit

Permalink
add num_layers & pp == 0 assertion
Browse files Browse the repository at this point in the history
Signed-off-by: dimapihtar <[email protected]>
  • Loading branch information
dimapihtar committed May 15, 2024
1 parent 6cb618a commit 5352a60
Showing 1 changed file with 6 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,12 @@ def __init__(self, cfg: DictConfig, trainer: Trainer):
'Expert parallelism is currently not supporting Apex distributed optimizer, use Mcore distributed optimizer instead'
)

if self.cfg.get('num_layers', 12) % self.cfg.get('pipeline_model_parallel_size', 1) != 0:
raise ValueError(
f"num_layers ({self.cfg.get('num_layers', 12)}) should be divisible by "
f"pipeline_model_parallel_size ({self.cfg.get('pipeline_model_parallel_size', 1)})"
)

self.transformer_engine = cfg.get('transformer_engine', False)
if self.megatron_amp_O2 and not self.transformer_engine:
logging.warning('megatron_amp_O2 is enabled but transformer-engine is not.')
Expand Down

0 comments on commit 5352a60

Please sign in to comment.