Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tensor parallelism related mappings #15360

Closed
wants to merge 20 commits into from

Conversation

hyunwoongko
Copy link
Contributor

@hyunwoongko hyunwoongko commented Jan 26, 2022

Discussed in #13690

In addition, I also created some helper functions.
cc @RezaYazdaniAminabadi @lucasleesw

Reviewers

@stas00 @jaketae @siddk

@HuggingFaceDocBuilder
Copy link

HuggingFaceDocBuilder commented Jan 26, 2022

The documentation is not available anymore as the PR was closed or merged.

@hyunwoongko
Copy link
Contributor Author

hyunwoongko commented Jan 27, 2022

In my opinion, this mapping + module replacement is the easiest and the most extensible way to implement tensor parallelism now. That's why we need this mapping. However, I've found there were three models (ProphetNet, BigBird, SqueezeBert) that could not be applied in this way. see: https://github.com/tunib-ai/parallelformers/blob/main/FAQ.md#q-why-are-some-models-not-supported

The documentation also points out EncoderDecoderModel and RAG, but this is because it does not match the implementation of parallelformers. It has nothing to do with these mappings.

@stas00
Copy link
Contributor

stas00 commented Jan 27, 2022

I think it's perfectly fine to design a way to automate most of the derivations if it works in most cases and leave the special cases to be that - special cases. Which can be taken care of later.

If you have ever used Perl, its motto: "make the easy things easy, and the hard things possible" and it lived up to that lofty goal. So if we can follow a similar principal here, then it'd be great!

that's is let's not replicate data that doesn't need to be replicated if it can be derived automatically, and make it possible to overcome exceptions by providing additional data when it can't be auto-derived.

Does it make sense?

@hyunwoongko
Copy link
Contributor Author

hyunwoongko commented Jan 27, 2022

@stas00 Yes. That will be fine. Let's proceed by adding mappings for exception cases. For example, most models match well, but if a few models have exceptions, it is easiest to solve the problem by constructing an additional dictionary for only those exception cases.

The REVERSED PARAMS and FUSED ATTENTION maps I wrote are examples. These cases are not difficult to define because there are only two of them across the transformers. @lucasleesw So, let's design it to support as many models as possible with as little effort as possible. That would be the best.

@hyunwoongko
Copy link
Contributor Author

hyunwoongko commented Jan 28, 2022

ValueError: The following files have docstrings written in rst:
- src/transformers/utils/tensor_parallel_utils.py
To fix this run `doc-builder convert path_to_py_file` after installing `doc-builder`
(`pip install git+https://github.com/huggingface/doc-builder`)
make: *** [Makefile:40: repo-consistency] Error 1

@stas00 @jaketae @siddk Can you help me? I think it's about documentation.

 hyunwoongko~/Github/transformers   master ±  doc-builder convert path_to_py_file
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/bin/doc-builder", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/doc_builder_cli.py", line 39, in main
    args.func(args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/convert_doc_file.py", line 114, in convert_command
    raise ValueError(f"This script only converts rst files. Got {source_file}.")
ValueError: This script only converts rst files. Got /Users/hyunwoongko/Github/transformers/path_to_py_file.

Do I have to create a new rst file?

@stas00
Copy link
Contributor

stas00 commented Jan 28, 2022

ValueError: The following files have docstrings written in rst:
- src/transformers/utils/tensor_parallel_utils.py
To fix this run `doc-builder convert path_to_py_file` after installing `doc-builder`
(`pip install git+https://github.com/huggingface/doc-builder`)
make: *** [Makefile:40: repo-consistency] Error 1

@stas00 @jaketae @siddk Can you help me? I think it's about documentation.

 hyunwoongko~/Github/transformers   master ±  doc-builder convert path_to_py_file
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/bin/doc-builder", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/doc_builder_cli.py", line 39, in main
    args.func(args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/convert_doc_file.py", line 114, in convert_command
    raise ValueError(f"This script only converts rst files. Got {source_file}.")
ValueError: This script only converts rst files. Got /Users/hyunwoongko/Github/transformers/path_to_py_file.

Do I have to create a new rst file?

we switched all docs from .rst and .md to .mdx some weeks back. Please See the updated docs tree in master. and README:
https://github.com/huggingface/transformers/tree/master/docs#readme

@hyunwoongko
Copy link
Contributor Author

hyunwoongko commented Jan 29, 2022

@stas00 @lucasleesw
I modified tensor parallel mapping. I am sure this is the best structure.

In conclusion, the list of string is not a suitable structure to represent this mapping because various parameter states exist, such as fusion and reverse, and they have many combinations. (fusion means the fused QKV parameters like linear(3 * dim, dim), not the kernel fusion)

“col_no_fuse”, “col_no_fuse_no_reverse”, “col_no_replace_no_fuse”, … I think this is not we want.
For example, it may or may not be reversed while being fused. Therefore, it is better to represent them as objects, and if we set the default values of attributes to the most common value, we can reduce the time required to create a mapping.

And I added many utility functions to search these states like:

  • mapping.get_fusion_degree(model, parameter)
  • mapping.is_reversed(model, parameter)
  • mapping.is_fused(model, parameter)
  • mapping.is_column_parallel(model, parameter)
  • mapping.is_row_parallel(model, parameter)
  • mapping.update_attributes(model)
  • ...

@jaketae
Copy link
Contributor

jaketae commented Jan 29, 2022

@hyunwoongko Apologies for the delayed review. I only have a few minor comments, and I can work on them if you'd like. Overall, I agree with the design choices you made. Thanks again for the PR!

Copy link
Contributor

@jaketae jaketae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left committable review comments. Feel free to accept/reject/modify as you see fit, as they are more stylistic/nit. LGTM!

src/transformers/utils/model_parallel_utils.py Outdated Show resolved Hide resolved
@jaketae
Copy link
Contributor

jaketae commented Feb 5, 2022

Hello @stas00, this PR is ready for review. Could you kindly take a look when time permits?

For context, this PR provides a mapping between model names and parameters that will enable tensor parallelism. While this PR was introduced as part of efforts to integrate Oslo-enabled 3D parallelism to transformers, the map was constructed in a way that is framework agnostic, i.e., it could be used by Oslo, DeepSpeed, and the likes.

cc @LysandreJik @sgugger

@hyunwoongko
Copy link
Contributor Author

@l-yohai If it works well after adding it to OSLO and testing it, let's continue to PR and add models here. I hope to support almost all models as soon as possible.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I've left a couple of suggestions regarding naming, as we usually go for descriptive variable names in Transformers.

We would also need docstrings in the classes introduced and their methods, or comments, before we merge this, so the code is documented and we can easily maintain it.

from math import ceil


class TPInfo(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a docstring here explaining what this class does.

Also, the name is unclear to someone who is not familiar with all the model parallelism jargon. Let's expand to TensorParallelismInfo as we usually use descriptive names in Transformers.

Comment on lines +22 to +27
def __init__(
self,
*name,
combined_qkv: bool = False,
reverse: bool = False,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(
self,
*name,
combined_qkv: bool = False,
reverse: bool = False,
):
def __init__(self, *name, combined_qkv: bool = False, reverse: bool = False):

This fits in one line (our char limit is 119).

return self.__str__()


Col = type("COLUMN", (TPInfo,), {"code": "Col"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above about naming. Let's go for the full Column here, there is no point sparing three characters :-)

Update = type("UPDATE", (TPInfo,), {"code": "Update"})


class TPMapping(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here TensorParallelismMapping. We also need a docstring to explain what this does and how to expand it.

Comment on lines +117 to +119
cls = getattr(transformers, f"{model_name}PreTrainedModel", None)
if cls is None:
cls = getattr(transformers, f"{model_name}PretrainedModel", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second test should not be necessary. We don't have any PretrainedModel without the capital T (except for bart, but the class without the capital is deprecated and we have a class with the capital).

Copy link
Contributor Author

@hyunwoongko hyunwoongko Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger Can you let me know which version this was applied from? The users of the lower transformers version still need this in the OSLO.

To add a little more, the transformers will also have this mapping, but the OSLO will have it internally as well. This is because this mapping class does not exist for users of lower versions of the transformers. Users with lower versions will use the mapping inside OSLO and users with higher versions will use the mapping from transformers. So this check is not required in transformers, but still required in OSLO.

cls = getattr(transformers, f"{model_name}PreTrainedModel", None)
if cls is None:
cls = getattr(transformers, f"{model_name}PretrainedModel", None)
assert cls is not None, f"Can not import the model named {cls}."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer going for a test and raising an error in Transformers:

Suggested change
assert cls is not None, f"Can not import the model named {cls}."
if cls is not None:
raise ValueError(f"Can not import the model named {cls}.")

Copy link
Contributor Author

@hyunwoongko hyunwoongko Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for correction! I think if cls is None is more correct. I'll apply your suggestion.

@hyunwoongko
Copy link
Contributor Author

hyunwoongko commented Feb 7, 2022

@sgugger Thanks for fixing code. I'll apply these suggestions to code.

@jaketae I'll apply some suggestions, Could you help me adding docstrings?

@stas00
Copy link
Contributor

stas00 commented Feb 7, 2022

I think that any such new mappings should have an application and tests with it, so I don't think this is a good addition to transformers the way it's presented at the moment as it could lead to dead code.

Typically any new features always come with application, tests and documentation. This PR has none of that.

@sgugger sgugger self-requested a review February 7, 2022 16:38
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm sorry, I thought this what part of a larger plan you had vetted together @stas00 and @jaketae . So indeed we want to make sure there is a tested application before merging.

@hyunwoongko
Copy link
Contributor Author

Then, I think I will have to add this when I add OSLO.

@hyunwoongko
Copy link
Contributor Author

closing the PR.

@hyunwoongko hyunwoongko closed this Feb 7, 2022
@stas00
Copy link
Contributor

stas00 commented Feb 7, 2022

do you not want to first integrate the suggestions by Sylvain so that when added in the new PR it won't need to be done second time?

@jaketae
Copy link
Contributor

jaketae commented Feb 7, 2022

Hi all, apologies for causing confusion. In future PRs, I'll make sure to communicate with Stas before requesting a final round of review.

On a separate note, I'll continue coordinating with Kevin to complete the remaining steps (i.e. docstrings, better documentation, reflecting feedback already received) to open a new PR.

@hyunwoongko
Copy link
Contributor Author

@stas00

do you not want to first integrate the suggestions by Sylvain so that when added in the new PR it won't need to be done second time?

I will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants