Add tensor parallelism related mappings #15360

hyunwoongko · 2022-01-26T23:42:19Z

Discussed in #13690

In addition, I also created some helper functions.
cc @RezaYazdaniAminabadi @lucasleesw

Reviewers

@stas00 @jaketae @siddk

HuggingFaceDocBuilder · 2022-01-26T23:42:42Z

The documentation is not available anymore as the PR was closed or merged.

src/transformers/utils/tensor_parallel_utils.py

hyunwoongko · 2022-01-27T00:32:52Z

In my opinion, this mapping + module replacement is the easiest and the most extensible way to implement tensor parallelism now. That's why we need this mapping. However, I've found there were three models (ProphetNet, BigBird, SqueezeBert) that could not be applied in this way. see: https://github.com/tunib-ai/parallelformers/blob/main/FAQ.md#q-why-are-some-models-not-supported

The documentation also points out EncoderDecoderModel and RAG, but this is because it does not match the implementation of parallelformers. It has nothing to do with these mappings.

stas00 · 2022-01-27T05:36:03Z

I think it's perfectly fine to design a way to automate most of the derivations if it works in most cases and leave the special cases to be that - special cases. Which can be taken care of later.

If you have ever used Perl, its motto: "make the easy things easy, and the hard things possible" and it lived up to that lofty goal. So if we can follow a similar principal here, then it'd be great!

that's is let's not replicate data that doesn't need to be replicated if it can be derived automatically, and make it possible to overcome exceptions by providing additional data when it can't be auto-derived.

Does it make sense?

hyunwoongko · 2022-01-27T05:45:55Z

@stas00 Yes. That will be fine. Let's proceed by adding mappings for exception cases. For example, most models match well, but if a few models have exceptions, it is easiest to solve the problem by constructing an additional dictionary for only those exception cases.

The REVERSED PARAMS and FUSED ATTENTION maps I wrote are examples. These cases are not difficult to define because there are only two of them across the transformers. @lucasleesw So, let's design it to support as many models as possible with as little effort as possible. That would be the best.

hyunwoongko · 2022-01-28T06:36:45Z

ValueError: The following files have docstrings written in rst:
- src/transformers/utils/tensor_parallel_utils.py
To fix this run `doc-builder convert path_to_py_file` after installing `doc-builder`
(`pip install git+https://github.com/huggingface/doc-builder`)
make: *** [Makefile:40: repo-consistency] Error 1

@stas00 @jaketae @siddk Can you help me? I think it's about documentation.

 hyunwoongko  ~/Github/transformers   master ±  doc-builder convert path_to_py_file
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/bin/doc-builder", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/doc_builder_cli.py", line 39, in main
    args.func(args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/convert_doc_file.py", line 114, in convert_command
    raise ValueError(f"This script only converts rst files. Got {source_file}.")
ValueError: This script only converts rst files. Got /Users/hyunwoongko/Github/transformers/path_to_py_file.

Do I have to create a new rst file?

stas00 · 2022-01-28T18:56:05Z

ValueError: The following files have docstrings written in rst:
- src/transformers/utils/tensor_parallel_utils.py
To fix this run `doc-builder convert path_to_py_file` after installing `doc-builder`
(`pip install git+https://github.com/huggingface/doc-builder`)
make: *** [Makefile:40: repo-consistency] Error 1

@stas00 @jaketae @siddk Can you help me? I think it's about documentation.

 hyunwoongko  ~/Github/transformers   master ±  doc-builder convert path_to_py_file
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/bin/doc-builder", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/doc_builder_cli.py", line 39, in main
    args.func(args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/doc_builder/commands/convert_doc_file.py", line 114, in convert_command
    raise ValueError(f"This script only converts rst files. Got {source_file}.")
ValueError: This script only converts rst files. Got /Users/hyunwoongko/Github/transformers/path_to_py_file.

Do I have to create a new rst file?

we switched all docs from .rst and .md to .mdx some weeks back. Please See the updated docs tree in master. and README:
https://github.com/huggingface/transformers/tree/master/docs#readme

hyunwoongko · 2022-01-29T09:27:20Z

@stas00 @lucasleesw
I modified tensor parallel mapping. I am sure this is the best structure.

In conclusion, the list of string is not a suitable structure to represent this mapping because various parameter states exist, such as fusion and reverse, and they have many combinations. (fusion means the fused QKV parameters like linear(3 * dim, dim), not the kernel fusion)

“col_no_fuse”, “col_no_fuse_no_reverse”, “col_no_replace_no_fuse”, … I think this is not we want.
For example, it may or may not be reversed while being fused. Therefore, it is better to represent them as objects, and if we set the default values of attributes to the most common value, we can reduce the time required to create a mapping.

And I added many utility functions to search these states like:

mapping.get_fusion_degree(model, parameter)
mapping.is_reversed(model, parameter)
mapping.is_fused(model, parameter)
mapping.is_column_parallel(model, parameter)
mapping.is_row_parallel(model, parameter)
mapping.update_attributes(model)
...

src/transformers/utils/model_parallel_utils.py

jaketae · 2022-01-29T23:27:14Z

@hyunwoongko Apologies for the delayed review. I only have a few minor comments, and I can work on them if you'd like. Overall, I agree with the design choices you made. Thanks again for the PR!

jaketae

I left committable review comments. Feel free to accept/reject/modify as you see fit, as they are more stylistic/nit. LGTM!

src/transformers/utils/model_parallel_utils.py

Co-authored-by: Jake Tae <[email protected]>

jaketae · 2022-02-05T21:22:36Z

Hello @stas00, this PR is ready for review. Could you kindly take a look when time permits?

For context, this PR provides a mapping between model names and parameters that will enable tensor parallelism. While this PR was introduced as part of efforts to integrate Oslo-enabled 3D parallelism to transformers, the map was constructed in a way that is framework agnostic, i.e., it could be used by Oslo, DeepSpeed, and the likes.

cc @LysandreJik @sgugger

hyunwoongko · 2022-02-06T00:51:27Z

@l-yohai If it works well after adding it to OSLO and testing it, let's continue to PR and add models here. I hope to support almost all models as soon as possible.

sgugger

Thanks for working on this. I've left a couple of suggestions regarding naming, as we usually go for descriptive variable names in Transformers.

We would also need docstrings in the classes introduced and their methods, or comments, before we merge this, so the code is documented and we can easily maintain it.

sgugger · 2022-02-07T13:01:20Z

src/transformers/utils/model_parallel_utils.py

 from math import ceil


+class TPInfo(object):


We need a docstring here explaining what this class does.

Also, the name is unclear to someone who is not familiar with all the model parallelism jargon. Let's expand to TensorParallelismInfo as we usually use descriptive names in Transformers.

sgugger · 2022-02-07T13:02:05Z

src/transformers/utils/model_parallel_utils.py

+    def __init__(
+        self,
+        *name,
+        combined_qkv: bool = False,
+        reverse: bool = False,
+    ):


Suggested change

def __init__(

self,

*name,

combined_qkv: bool = False,

reverse: bool = False,

):

def __init__(self, *name, combined_qkv: bool = False, reverse: bool = False):

This fits in one line (our char limit is 119).

sgugger · 2022-02-07T13:02:35Z

src/transformers/utils/model_parallel_utils.py

+        return self.__str__()
+
+
+Col = type("COLUMN", (TPInfo,), {"code": "Col"})


Same remark as above about naming. Let's go for the full Column here, there is no point sparing three characters :-)

sgugger · 2022-02-07T13:02:50Z

src/transformers/utils/model_parallel_utils.py

+Update = type("UPDATE", (TPInfo,), {"code": "Update"})
+
+
+class TPMapping(object):


And here TensorParallelismMapping. We also need a docstring to explain what this does and how to expand it.

sgugger · 2022-02-07T13:06:55Z

src/transformers/utils/model_parallel_utils.py

+        cls = getattr(transformers, f"{model_name}PreTrainedModel", None)
+        if cls is None:
+            cls = getattr(transformers, f"{model_name}PretrainedModel", None)


The second test should not be necessary. We don't have any PretrainedModel without the capital T (except for bart, but the class without the capital is deprecated and we have a class with the capital).

@sgugger Can you let me know which version this was applied from? The users of the lower transformers version still need this in the OSLO.

To add a little more, the transformers will also have this mapping, but the OSLO will have it internally as well. This is because this mapping class does not exist for users of lower versions of the transformers. Users with lower versions will use the mapping inside OSLO and users with higher versions will use the mapping from transformers. So this check is not required in transformers, but still required in OSLO.

sgugger · 2022-02-07T13:07:32Z

src/transformers/utils/model_parallel_utils.py

+        cls = getattr(transformers, f"{model_name}PreTrainedModel", None)
+        if cls is None:
+            cls = getattr(transformers, f"{model_name}PretrainedModel", None)
+        assert cls is not None, f"Can not import the model named {cls}."


We prefer going for a test and raising an error in Transformers:

Suggested change

assert cls is not None, f"Can not import the model named {cls}."

if cls is not None:

raise ValueError(f"Can not import the model named {cls}.")

Thanks for correction! I think if cls is None is more correct. I'll apply your suggestion.

hyunwoongko · 2022-02-07T14:10:58Z

@sgugger Thanks for fixing code. I'll apply these suggestions to code.

@jaketae I'll apply some suggestions, Could you help me adding docstrings?

stas00 · 2022-02-07T16:32:49Z

I think that any such new mappings should have an application and tests with it, so I don't think this is a good addition to transformers the way it's presented at the moment as it could lead to dead code.

Typically any new features always come with application, tests and documentation. This PR has none of that.

sgugger

Oh I'm sorry, I thought this what part of a larger plan you had vetted together @stas00 and @jaketae . So indeed we want to make sure there is a tested application before merging.

hyunwoongko · 2022-02-07T16:55:12Z

Then, I think I will have to add this when I add OSLO.

hyunwoongko · 2022-02-07T16:55:24Z

closing the PR.

stas00 · 2022-02-07T17:03:05Z

do you not want to first integrate the suggestions by Sylvain so that when added in the new PR it won't need to be done second time?

jaketae · 2022-02-07T18:00:03Z

Hi all, apologies for causing confusion. In future PRs, I'll make sure to communicate with Stas before requesting a final round of review.

On a separate note, I'll continue coordinating with Kevin to complete the remaining steps (i.e. docstrings, better documentation, reflecting feedback already received) to open a new PR.

hyunwoongko · 2022-02-07T23:47:20Z

@stas00

do you not want to first integrate the suggestions by Sylvain so that when added in the new PR it won't need to be done second time?

I will.

hyunwoongko added 2 commits January 27, 2022 08:38

Add tensor parallelism related mappings

ec54d81

Modify 'transformers' to '..'

e79ff07

Fix circular import problem

0d50c47

hyunwoongko commented Jan 26, 2022

View reviewed changes

src/transformers/utils/tensor_parallel_utils.py Outdated Show resolved Hide resolved

src/transformers/utils/tensor_parallel_utils.py Outdated Show resolved Hide resolved

src/transformers/utils/tensor_parallel_utils.py Outdated Show resolved Hide resolved

Add T5

04089dc

hyunwoongko added 3 commits January 28, 2022 23:34

Add copy mapping

7353d3e

Style code

bec6a84

remove transformers dependancy

dc7e74f

Modify structure of tp mapping

6086772

hyunwoongko added 5 commits January 29, 2022 19:20

remove all the docstring

320a6e0

Add code

ded2fd7

Fix bug

5e9b0ef

Style code

c671435

fuse to combined qkv

cb724a9

jaketae reviewed Jan 29, 2022

View reviewed changes

src/transformers/utils/model_parallel_utils.py Outdated Show resolved Hide resolved

jaketae reviewed Jan 29, 2022

View reviewed changes

src/transformers/utils/model_parallel_utils.py Show resolved Hide resolved

jaketae reviewed Jan 29, 2022

View reviewed changes

src/transformers/utils/model_parallel_utils.py Show resolved Hide resolved

hyunwoongko added 2 commits January 30, 2022 09:22

Fix bug of mapping

844d27c

Add GPTNeo, GPTJ

fb4af90

jaketae reviewed Feb 5, 2022

View reviewed changes

src/transformers/utils/model_parallel_utils.py Outdated Show resolved Hide resolved

src/transformers/utils/model_parallel_utils.py Show resolved Hide resolved

src/transformers/utils/model_parallel_utils.py Show resolved Hide resolved

jaketae reviewed Feb 5, 2022

View reviewed changes

src/transformers/utils/model_parallel_utils.py Outdated Show resolved Hide resolved

jaketae reviewed Feb 5, 2022

View reviewed changes

src/transformers/utils/model_parallel_utils.py Outdated Show resolved Hide resolved

hyunwoongko and others added 5 commits February 6, 2022 05:36

Update src/transformers/utils/model_parallel_utils.py

60ba049

Co-authored-by: Jake Tae <[email protected]>

Update src/transformers/utils/model_parallel_utils.py

b6ea797

Co-authored-by: Jake Tae <[email protected]>

Update src/transformers/utils/model_parallel_utils.py

958003b

Co-authored-by: Jake Tae <[email protected]>

Update src/transformers/utils/model_parallel_utils.py

2f3230d

Co-authored-by: Jake Tae <[email protected]>

Update src/transformers/utils/model_parallel_utils.py

b9e9ace

Co-authored-by: Jake Tae <[email protected]>

sgugger approved these changes Feb 7, 2022

View reviewed changes

sgugger self-requested a review February 7, 2022 16:38

sgugger requested changes Feb 7, 2022

View reviewed changes

hyunwoongko closed this Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tensor parallelism related mappings #15360

Add tensor parallelism related mappings #15360

hyunwoongko commented Jan 26, 2022 •

edited

Loading

HuggingFaceDocBuilder commented Jan 26, 2022 •

edited

Loading

hyunwoongko commented Jan 27, 2022 •

edited

Loading

stas00 commented Jan 27, 2022 •

edited

Loading

hyunwoongko commented Jan 27, 2022 •

edited

Loading

hyunwoongko commented Jan 28, 2022 •

edited

Loading

stas00 commented Jan 28, 2022 •

edited

Loading

hyunwoongko commented Jan 29, 2022 •

edited

Loading

jaketae commented Jan 29, 2022

jaketae left a comment

jaketae commented Feb 5, 2022

hyunwoongko commented Feb 6, 2022

sgugger left a comment

sgugger Feb 7, 2022

sgugger Feb 7, 2022

sgugger Feb 7, 2022

sgugger Feb 7, 2022

sgugger Feb 7, 2022

hyunwoongko Feb 8, 2022 •

edited

Loading

sgugger Feb 7, 2022

hyunwoongko Feb 8, 2022 •

edited

Loading

hyunwoongko commented Feb 7, 2022 •

edited

Loading

stas00 commented Feb 7, 2022

sgugger left a comment

hyunwoongko commented Feb 7, 2022

hyunwoongko commented Feb 7, 2022

stas00 commented Feb 7, 2022

jaketae commented Feb 7, 2022

hyunwoongko commented Feb 7, 2022

		return self.__str__()


		Col = type("COLUMN", (TPInfo,), {"code": "Col"})

		Update = type("UPDATE", (TPInfo,), {"code": "Update"})


		class TPMapping(object):

	assert cls is not None, f"Can not import the model named {cls}."
	if cls is not None:
	raise ValueError(f"Can not import the model named {cls}.")

Add tensor parallelism related mappings #15360

Add tensor parallelism related mappings #15360

Conversation

hyunwoongko commented Jan 26, 2022 • edited Loading

Reviewers

HuggingFaceDocBuilder commented Jan 26, 2022 • edited Loading

hyunwoongko commented Jan 27, 2022 • edited Loading

stas00 commented Jan 27, 2022 • edited Loading

hyunwoongko commented Jan 27, 2022 • edited Loading

hyunwoongko commented Jan 28, 2022 • edited Loading

stas00 commented Jan 28, 2022 • edited Loading

hyunwoongko commented Jan 29, 2022 • edited Loading

jaketae commented Jan 29, 2022

jaketae left a comment

Choose a reason for hiding this comment

jaketae commented Feb 5, 2022

hyunwoongko commented Feb 6, 2022

sgugger left a comment

Choose a reason for hiding this comment

sgugger Feb 7, 2022

Choose a reason for hiding this comment

sgugger Feb 7, 2022

Choose a reason for hiding this comment

sgugger Feb 7, 2022

Choose a reason for hiding this comment

sgugger Feb 7, 2022

Choose a reason for hiding this comment

sgugger Feb 7, 2022

Choose a reason for hiding this comment

hyunwoongko Feb 8, 2022 • edited Loading

Choose a reason for hiding this comment

sgugger Feb 7, 2022

Choose a reason for hiding this comment

hyunwoongko Feb 8, 2022 • edited Loading

Choose a reason for hiding this comment

hyunwoongko commented Feb 7, 2022 • edited Loading

stas00 commented Feb 7, 2022

sgugger left a comment

Choose a reason for hiding this comment

hyunwoongko commented Feb 7, 2022

hyunwoongko commented Feb 7, 2022

stas00 commented Feb 7, 2022

jaketae commented Feb 7, 2022

hyunwoongko commented Feb 7, 2022

hyunwoongko commented Jan 26, 2022 •

edited

Loading

HuggingFaceDocBuilder commented Jan 26, 2022 •

edited

Loading

hyunwoongko commented Jan 27, 2022 •

edited

Loading

stas00 commented Jan 27, 2022 •

edited

Loading

hyunwoongko commented Jan 27, 2022 •

edited

Loading

hyunwoongko commented Jan 28, 2022 •

edited

Loading

stas00 commented Jan 28, 2022 •

edited

Loading

hyunwoongko commented Jan 29, 2022 •

edited

Loading

hyunwoongko Feb 8, 2022 •

edited

Loading

hyunwoongko Feb 8, 2022 •

edited

Loading

hyunwoongko commented Feb 7, 2022 •

edited

Loading