Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support branch parallel for evoformer #14

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

GuoxiaWang
Copy link

@GuoxiaWang GuoxiaWang commented Nov 10, 2022

@guolinke
Copy link
Member

Thank you, I will review this in the weekend.

@@ -49,6 +49,7 @@ def main(args) -> None:
), "Must specify batch size either with --batch-size"
metrics.reset()

args.seed += args.dp_rank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using a hybrid distributed parallel strategy, such as DP-BP, the parameters and data in the same BP group need to be the same, so the seeds need to be the same.

@@ -137,6 +139,9 @@ def distributed_init(args):
if torch.cuda.is_available():
dist.all_reduce(torch.zeros(1).cuda())

scg.init_group(bp_degree=args.bp_degree, dap_degree=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this affect the normal c10d, no_c10d mode?
Can we make "bp" a choice, like currently c10d, no_c10d?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this question. This PR is just to show how to use BP, not to merge this PR into UniCore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I may miss some contexts.


return outer_grad.clone(), msa_grad.clone(), pair_grad.clone()

def sync_evoformer_results(outer, msa, pair, training):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the functions in this file are better to be in Uni-Fold.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as above. It is necessary to design the code together and merge them into UniFold and UniCore respectively.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants