-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Establish SIG OpenXLA #419
Conversation
We propose to create SIG OpenXLA to facilitate development of an open, state-of-art ML compiler, built collaboratively with ML hardware & framework developers, using the best of XLA & MLIR. ## Objective OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors. SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA. ## Goals * Accelerate industry collaboration around XLA and build a vibrant OSS community. * Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors. * Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent. * Ensure the extraction of XLA from TensorFlow is minimally disruptive to existing users and contributors. * Create a product identity with its own brand, website, docs, and communication channels. * Discuss establishment of governance outside TensorFlow.
|
||
* Accelerate industry collaboration around XLA and build a vibrant OSS community. | ||
* Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors. | ||
* Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to evaluate the pros and cons of using an independent Github org also related to the Keras migration experience.
One of the main issue:
- No ticket migration between different Github orgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this all bout
It could be nice if the new SIGs like this one could adopt and eventually improve then README.md and CONTIRBUTING.md templates. |
|
||
OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors. | ||
|
||
SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have already an idea of what TF folders will be involved in this process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I hope that we are not going to just mirror folders in TF as with the MHLO repo and we could clearly isolate the components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM.
Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.
MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).
The folders involved are:
- tensorflow/compiler/xla -> will be the new OpenXLA repository root
- tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split
- A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.
I think that this is the most important part. Just with a monolithic approach in third_party
we are not going to solve the build invalidation (and some TF breakages) that we experience every day.
The folders involved are:
....
I've recently contributed to TF2XLA with many frictions between OSS and the internal infra.
As this folder is not included in your list are these contributions still going to be done in the TF main repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.
Yes absolutely: this is just an entire different track of work with a different motivation than what motivates OpenXLA right now.
Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?
I've recently contributed to TF2XLA with tensorflow/build#122 between OSS and the internal infra.
Ouch... these kind of difference between Bazel and the internal Google checks seems really annoying, we should be able to align this though?
As this folder is not included in your list are these contributions still going to be done in the TF main repo?
OpenXLA won't have any dependency on TensorFlow, so the TF/XLA bridge will naturally continue to be part of TensorFlow moving forward.
(regardless of where the code go: the kind of problem you refer to will exist and we should address them!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?
This really depend on your vision about the productization roadmap.
If TF master/nightly will rely on OpenXLA "rolling sha" commits that will rely on LLVM "rolling sha" commits and so we are really not relying on releases, API versioning, etc.. I think it will be really a weak modularization and not so much something that could improve the current status quo.
Some positive side effects could be retreived by disentangling the targets deps graph:
#238
But I think that the main impact is still realated to the OpenXLA own roadmap/vision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM. Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.
MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).
The folders involved are:
- tensorflow/compiler/xla -> will be the new OpenXLA repository root
- tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split
- A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).
Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?
I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?
I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?
Mostly yes: HLO isn't gonna go away anytime soon, but for the current targets publicly supported by XLA (CPU/GPU) we're pledging to use MLIR (and MHLO) end-to-end on the long term and be the preferred way to add new high-level optimizations to XLA. We're also planning to continue developing most of the codegen inside MLIR/LLVM itself (Linalg in particular) and use it inside XLA. This offers opportunities to share large part of it with other projects like IREE for example.
Excited to see this develop! |
Super exciting! Will OpenXLA be under open governance (i.e. similar to the LLVM model)? Or will it be governed under the TensorFlow / Google umbrella? |
We touched on this in the RFC, see this section: https://github.com/tensorflow/community/blob/master/rfcs/20220713-sig-open-xla.md#collaboration--governance We aim to evolve toward a model as-open-as-LLVM in terms of governance. It’ll be a gradual process and we want to consult with the members/contributors to help us define a good governance for the project. This will be an important aspect of the SIG. |
@sanjoy Other then this, another governance point that was discussed was the related sub-governance of MHLO: |
As far as MHLO goes, we've been internally working on something called StableHLO - a version of HLO/MHLO that will provide stability guarantees, a specification, a test suite and a reference implementation. In the near future, StableHLO will be switching to GitHub-first development process - the code will be developed via pull requests, there will be a GitHub-based test suite, GitHub Issues will be used to track the work, and GitHub Discussions / Discord will be used for discussions. We're in the final stages of approvals for all this, and I expect that we'll be able to tell (and show) more shortly. The overall goal for StableHLO is to create a community to build an amazing portability layer between ML frameworks and ML compilers. HLO/MHLO provide a good foundation, but there are a lot of good ideas beyond that, and I can't wait to start working this all out together. |
Then what is the relationship between OpenXLA and StableHLO? @burmako @joker-eph @theadactyl |
What about JAX? The XLA part will also be extracted out? |
@fortianyou "Then what is the relationship between OpenXLA and StableHLO?". There is a plan for StableHLO to be used as input for XLA, and StableHLO has its roots in HLO which comes from XLA, so I expect that OpenXLA and StableHLO will have a close relationship. That said, our goal with StableHLO is to build a portability layer between ML frameworks and ML compilers, which means that we will avoid coupling StableHLO with particular compilers, e.g. XLA, so that other compilers could pick it up as well if they are interested. As we bootstrap StableHLO in the near future, we'll be reviewing which parts of HLO/MHLO can become part of StableHLO right away and which parts are XLA-specific (and should stay internal to XLA or should be generalized before being included in StableHLO). E.g. should ops like We believe that OpenXLA will be a great forum for these discussions, so we decided that we will be opensourcing StableHLO under OpenXLA's GitHub organization and will be using OpenXLA's Discord server to chat about StableHLO. Hopefully, this answers your question! |
@wchao1115 FYI. |
Just to follow up, feel free to subscribe to this repo: https://github.com/openxla/community We're using GitHub discussions right now, see here the announcement for the first public meeting (next Tuesday): openxla/community#5 |
RFC: Establish SIG OpenXLA
We propose to create SIG OpenXLA to facilitate development of an open, state-of-art ML compiler, built collaboratively with ML hardware & framework developers, using the best of XLA & MLIR.
Objective
OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.
SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA.
Goals