Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context Parallel SFT Support for dataset in THD format #10688

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

tomlifu
Copy link
Contributor

@tomlifu tomlifu commented Sep 30, 2024

What does this PR do ?

This PR adds CP support for THD format and is compatible with cu_seqlen_padded in the latest CUDNN fused attention.

Steps to run SFT + CP + THD format:

  1. Prepare packed dataset in THD format: run scripts/nlp_language_modeling/prepare_packed_ft_dataset.py to pack the dataset into THD format in desired sequence length. For example:
python <NeMo_top_dir>/scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
        model.data.train_ds.file_names=[<dataset_top_dir>/squad/1_squad_train.jsonl] \
        model.data.train_ds.max_seq_length=4096 \
        +model.context_parallel_size=2 \
        +tokenizer_path=<tokenizer_path> \
        +output_dir=<output_dir> +pack_sizes=[4096] \
  1. Run SFT on the packed dataset in THD format with the same CP size specified in the last step

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

@github-actions github-actions bot added the NLP label Sep 30, 2024
@tomlifu tomlifu changed the title Draft: Context Parallel SFT Support for dataset in THD format Context Parallel SFT Support for dataset in THD format Oct 1, 2024
@xrennvidia xrennvidia self-requested a review October 2, 2024 17:44
@xrennvidia
Copy link
Collaborator

Please fix DCO also.

@xrennvidia xrennvidia removed the audio label Oct 25, 2024
@switiz
Copy link

switiz commented Nov 20, 2024

Could you let me know when it will be completed? I’ve been really looking forward to this feature. It works in pretrain, but it’s really strange that it doesn’t work in SFT.

Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_dataset
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:70:0: C0301: Line too long (353/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:72:0: C0301: Line too long (173/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:73:0: C0301: Line too long (156/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:79:0: C0301: Line too long (157/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:82:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:83:0: C0301: Line too long (178/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:84:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:85:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:87:0: C0301: Line too long (144/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:90:0: C0301: Line too long (247/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:165:0: C0301: Line too long (125/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:174:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:244:0: C0301: Line too long (137/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:247:0: C0301: Line too long (133/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:272:0: C0301: Line too long (146/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:277:0: C0301: Line too long (153/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:278:0: C0301: Line too long (155/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:300:0: C0301: Line too long (127/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:389:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:655:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:36:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py:526:0: C0115: Missing class docstring (missing-class-docstring)
************* Module nemo.collections.nlp.models.language_modeling.megatron_gpt_model
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:328:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:368:0: C0301: Line too long (136/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:414:0: C0301: Line too long (126/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:461:0: C0301: Line too long (122/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:792:0: C0301: Line too long (131/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1107:0: C0301: Line too long (146/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1128:0: C0301: Line too long (168/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1372:0: C0301: Line too long (122/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1429:0: C0301: Line too long (140/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1610:0: C0301: Line too long (132/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1611:0: C0301: Line too long (136/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1613:0: C0301: Line too long (159/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1784:0: C0301: Line too long (128/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1804:0: C0301: Line too long (140/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1812:0: C0301: Line too long (155/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1833:0: C0301: Line too long (141/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1903:0: C0301: Line too long (125/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1930:0: C0301: Line too long (134/119) (line-too-long)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:141:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:154:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:180:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:199:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:246:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:284:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:288:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:300:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:308:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:470:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:473:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:702:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:706:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:788:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1105:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1178:12: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1227:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1256:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1442:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1577:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1585:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1594:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:1878:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:2031:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:2038:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:2044:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:20:0: W0611: Unused fields imported from dataclasses (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:29:0: W0611: Unused _DataFetcherWrapper imported from lightning.pytorch.loops.fetchers (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:31:0: W0611: Unused OmegaConf imported from omegaconf (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:73:0: W0611: Unused activation_to_func imported from nemo.collections.nlp.parts.utils_funcs (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:82:4: W0611: Unused megatron.core imported as core (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:83:4: W0611: Unused tensor_parallel imported from megatron.core (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:102:4: W0611: Unused init_method_normal imported from megatron.core.utils (unused-import)
nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py:102:4: W0611: Unused scaled_init_method_normal imported from megatron.core.utils (unused-import)
************* Module nemo.utils.sequence_packing_utils
nemo/utils/sequence_packing_utils.py:53:0: C0301: Line too long (125/119) (line-too-long)
nemo/utils/sequence_packing_utils.py:112:0: C0301: Line too long (127/119) (line-too-long)
nemo/utils/sequence_packing_utils.py:121:0: C0301: Line too long (122/119) (line-too-long)
nemo/utils/sequence_packing_utils.py:122:0: C0301: Line too long (139/119) (line-too-long)
************* Module scripts.nlp_language_modeling.prepare_packed_ft_dataset
scripts/nlp_language_modeling/prepare_packed_ft_dataset.py:206:0: C0301: Line too long (157/119) (line-too-long)
scripts/nlp_language_modeling/prepare_packed_ft_dataset.py:169:0: C0115: Missing class docstring (missing-class-docstring)
scripts/nlp_language_modeling/prepare_packed_ft_dataset.py:175:4: C0116: Missing function or method docstring (missing-function-docstring)
scripts/nlp_language_modeling/prepare_packed_ft_dataset.py:188:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.46/10

Thank you for improving NeMo's documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants