-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tensor parallelism related mappings #15360
Conversation
The documentation is not available anymore as the PR was closed or merged. |
In my opinion, this mapping + module replacement is the easiest and the most extensible way to implement tensor parallelism now. That's why we need this mapping. However, I've found there were three models (ProphetNet, BigBird, SqueezeBert) that could not be applied in this way. see: https://github.com/tunib-ai/parallelformers/blob/main/FAQ.md#q-why-are-some-models-not-supported The documentation also points out EncoderDecoderModel and RAG, but this is because it does not match the implementation of parallelformers. It has nothing to do with these mappings. |
I think it's perfectly fine to design a way to automate most of the derivations if it works in most cases and leave the special cases to be that - special cases. Which can be taken care of later. If you have ever used Perl, its motto: "make the easy things easy, and the hard things possible" and it lived up to that lofty goal. So if we can follow a similar principal here, then it'd be great! that's is let's not replicate data that doesn't need to be replicated if it can be derived automatically, and make it possible to overcome exceptions by providing additional data when it can't be auto-derived. Does it make sense? |
@stas00 Yes. That will be fine. Let's proceed by adding mappings for exception cases. For example, most models match well, but if a few models have exceptions, it is easiest to solve the problem by constructing an additional dictionary for only those exception cases. The |
ValueError: The following files have docstrings written in rst:
- src/transformers/utils/tensor_parallel_utils.py
To fix this run `doc-builder convert path_to_py_file` after installing `doc-builder`
(`pip install git+https://github.com/huggingface/doc-builder`)
make: *** [Makefile:40: repo-consistency] Error 1 @stas00 @jaketae @siddk Can you help me? I think it's about documentation. hyunwoongko ~/Github/transformers master ± doc-builder convert path_to_py_file
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/bin/doc-builder", line 8, in <module>
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/doc_builder_cli.py", line 39, in main
args.func(args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/convert_doc_file.py", line 114, in convert_command
raise ValueError(f"This script only converts rst files. Got {source_file}.")
ValueError: This script only converts rst files. Got /Users/hyunwoongko/Github/transformers/path_to_py_file. Do I have to create a new rst file? |
we switched all docs from .rst and .md to .mdx some weeks back. Please See the updated docs tree in master. and README: |
@stas00 @lucasleesw In conclusion, the list of string is not a suitable structure to represent this mapping because various parameter states exist, such as fusion and reverse, and they have many combinations. (fusion means the fused QKV parameters like linear(3 * dim, dim), not the kernel fusion) “col_no_fuse”, “col_no_fuse_no_reverse”, “col_no_replace_no_fuse”, … I think this is not we want. And I added many utility functions to search these states like:
|
@hyunwoongko Apologies for the delayed review. I only have a few minor comments, and I can work on them if you'd like. Overall, I agree with the design choices you made. Thanks again for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left committable review comments. Feel free to accept/reject/modify as you see fit, as they are more stylistic/nit. LGTM!
Co-authored-by: Jake Tae <[email protected]>
Co-authored-by: Jake Tae <[email protected]>
Co-authored-by: Jake Tae <[email protected]>
Co-authored-by: Jake Tae <[email protected]>
Co-authored-by: Jake Tae <[email protected]>
Hello @stas00, this PR is ready for review. Could you kindly take a look when time permits? For context, this PR provides a mapping between model names and parameters that will enable tensor parallelism. While this PR was introduced as part of efforts to integrate Oslo-enabled 3D parallelism to transformers, the map was constructed in a way that is framework agnostic, i.e., it could be used by Oslo, DeepSpeed, and the likes. |
@l-yohai If it works well after adding it to OSLO and testing it, let's continue to PR and add models here. I hope to support almost all models as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I've left a couple of suggestions regarding naming, as we usually go for descriptive variable names in Transformers.
We would also need docstrings in the classes introduced and their methods, or comments, before we merge this, so the code is documented and we can easily maintain it.
from math import ceil | ||
|
||
|
||
class TPInfo(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a docstring here explaining what this class does.
Also, the name is unclear to someone who is not familiar with all the model parallelism jargon. Let's expand to TensorParallelismInfo
as we usually use descriptive names in Transformers.
def __init__( | ||
self, | ||
*name, | ||
combined_qkv: bool = False, | ||
reverse: bool = False, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__( | |
self, | |
*name, | |
combined_qkv: bool = False, | |
reverse: bool = False, | |
): | |
def __init__(self, *name, combined_qkv: bool = False, reverse: bool = False): |
This fits in one line (our char limit is 119).
return self.__str__() | ||
|
||
|
||
Col = type("COLUMN", (TPInfo,), {"code": "Col"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as above about naming. Let's go for the full Column
here, there is no point sparing three characters :-)
Update = type("UPDATE", (TPInfo,), {"code": "Update"}) | ||
|
||
|
||
class TPMapping(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here TensorParallelismMapping
. We also need a docstring to explain what this does and how to expand it.
cls = getattr(transformers, f"{model_name}PreTrainedModel", None) | ||
if cls is None: | ||
cls = getattr(transformers, f"{model_name}PretrainedModel", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second test should not be necessary. We don't have any PretrainedModel
without the capital T (except for bart, but the class without the capital is deprecated and we have a class with the capital).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger Can you let me know which version this was applied from? The users of the lower transformers version still need this in the OSLO.
To add a little more, the transformers will also have this mapping, but the OSLO will have it internally as well. This is because this mapping class does not exist for users of lower versions of the transformers. Users with lower versions will use the mapping inside OSLO and users with higher versions will use the mapping from transformers. So this check is not required in transformers, but still required in OSLO.
cls = getattr(transformers, f"{model_name}PreTrainedModel", None) | ||
if cls is None: | ||
cls = getattr(transformers, f"{model_name}PretrainedModel", None) | ||
assert cls is not None, f"Can not import the model named {cls}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer going for a test and raising an error in Transformers:
assert cls is not None, f"Can not import the model named {cls}." | |
if cls is not None: | |
raise ValueError(f"Can not import the model named {cls}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for correction! I think if cls is None
is more correct. I'll apply your suggestion.
I think that any such new mappings should have an application and tests with it, so I don't think this is a good addition to transformers the way it's presented at the moment as it could lead to dead code. Typically any new features always come with application, tests and documentation. This PR has none of that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, I think I will have to add this when I add OSLO. |
closing the PR. |
do you not want to first integrate the suggestions by Sylvain so that when added in the new PR it won't need to be done second time? |
Hi all, apologies for causing confusion. In future PRs, I'll make sure to communicate with Stas before requesting a final round of review. On a separate note, I'll continue coordinating with Kevin to complete the remaining steps (i.e. docstrings, better documentation, reflecting feedback already received) to open a new PR. |
I will. |
Discussed in #13690
In addition, I also created some helper functions.
cc @RezaYazdaniAminabadi @lucasleesw
Reviewers
@stas00 @jaketae @siddk