[NeMo-UX] Use single instance of loss reductions in GPTModel #9801

hemildesai · 2024-07-18T22:18:54Z

What does this PR do ?

Uses a singleton for training_loss_reduction and validation_loss_reduction in GPTModel.

Collection: llm

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

ShriyaPalsamudram

Functionality works

ericharper · 2024-07-19T18:38:30Z

nemo/collections/llm/gpt/model/base.py

+TRAIN_LOSS_REDUCTION = MaskedTokenLossReduction()
+VALIDATION_LOSS_REDUCTION = MaskedTokenLossReduction(validation_step=True)


Let's comment that this is a temporary WAR.

Or remove the globals.

akoumpa

LGTM,

Alternatively we could decorate training_loss_reduction & validation_loss_reduction with @cache, but I have no preference. Please proceed with the merge.

Thanks.

Signed-off-by: Hemil Desai <[email protected]>

Signed-off-by: hemildesai <[email protected]>

Signed-off-by: Hemil Desai <[email protected]>

* Use single instance of loss reductions Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Refactor Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]>

…9861) * Use single instance of loss reductions * Apply isort and black reformatting * Refactor --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]>

…9801) (NVIDIA#9861) * Use single instance of loss reductions * Apply isort and black reformatting * Refactor --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Signed-off-by: adityavavre <[email protected]>

…9861) * Use single instance of loss reductions * Apply isort and black reformatting * Refactor --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]>

…9801) (NVIDIA#9861) * Use single instance of loss reductions * Apply isort and black reformatting * Refactor --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Signed-off-by: Hainan Xu <[email protected]>

hemildesai changed the title ~~Use single instance of loss reductions~~ [NeMo-UX] Use single instance of loss reductions Jul 18, 2024

hemildesai changed the title ~~[NeMo-UX] Use single instance of loss reductions~~ [NeMo-UX] Use single instance of loss reductions in GPTModel Jul 18, 2024

hemildesai force-pushed the hemil/potential-memory-fix branch from b66e35d to 6306228 Compare July 18, 2024 22:22

ShriyaPalsamudram self-assigned this Jul 19, 2024

ShriyaPalsamudram approved these changes Jul 19, 2024

View reviewed changes

hemildesai marked this pull request as ready for review July 19, 2024 18:36

ShriyaPalsamudram previously approved these changes Jul 19, 2024

View reviewed changes

ericharper reviewed Jul 19, 2024

View reviewed changes

ericharper added the 2.0.0rc1 label Jul 19, 2024

hemildesai dismissed ShriyaPalsamudram’s stale review via 61f36a3 July 19, 2024 21:41

hemildesai requested a review from akoumpa July 22, 2024 17:39

akoumpa approved these changes Jul 22, 2024

View reviewed changes

ericharper added the Run CICD label Jul 22, 2024

hemildesai force-pushed the hemil/potential-memory-fix branch 2 times, most recently from 6d35351 to 537259b Compare July 23, 2024 16:14

ericharper added Run CICD and removed Run CICD labels Jul 23, 2024

hemildesai and others added 3 commits July 23, 2024 12:14

Use single instance of loss reductions

f776d7f

Signed-off-by: Hemil Desai <[email protected]>

Apply isort and black reformatting

09d493e

Signed-off-by: hemildesai <[email protected]>

Refactor

eded322

Signed-off-by: Hemil Desai <[email protected]>

hemildesai force-pushed the hemil/potential-memory-fix branch from 537259b to eded322 Compare July 23, 2024 19:14

hemildesai added Run CICD and removed Run CICD labels Jul 23, 2024

hemildesai merged commit 0ba1991 into r2.0.0rc1 Jul 24, 2024
184 of 225 checks passed

hemildesai deleted the hemil/potential-memory-fix branch July 24, 2024 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NeMo-UX] Use single instance of loss reductions in GPTModel #9801

[NeMo-UX] Use single instance of loss reductions in GPTModel #9801

hemildesai commented Jul 18, 2024 •

edited by ShriyaPalsamudram

Loading

ShriyaPalsamudram left a comment

ericharper Jul 19, 2024

ericharper Jul 19, 2024

hemildesai Jul 22, 2024

akoumpa left a comment

		TRAIN_LOSS_REDUCTION = MaskedTokenLossReduction()
		VALIDATION_LOSS_REDUCTION = MaskedTokenLossReduction(validation_step=True)

[NeMo-UX] Use single instance of loss reductions in GPTModel #9801

[NeMo-UX] Use single instance of loss reductions in GPTModel #9801

Conversation

hemildesai commented Jul 18, 2024 • edited by ShriyaPalsamudram Loading

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

ShriyaPalsamudram left a comment

Choose a reason for hiding this comment

ericharper Jul 19, 2024

Choose a reason for hiding this comment

ericharper Jul 19, 2024

Choose a reason for hiding this comment

hemildesai Jul 22, 2024

Choose a reason for hiding this comment

akoumpa left a comment

Choose a reason for hiding this comment

hemildesai commented Jul 18, 2024 •

edited by ShriyaPalsamudram

Loading