RFC: Establish SIG OpenXLA #419

theadactyl · 2022-07-13T20:35:19Z

RFC: Establish SIG OpenXLA

Status	Accepted
RFC #	419
Author(s)	Thea Lamkin ([email protected]), Mehdi Amini ([email protected])
Sponsor	Thea Lamkin ([email protected])
Updated	2022-07-13

We propose to create SIG OpenXLA to facilitate development of an open, state-of-art ML compiler, built collaboratively with ML hardware & framework developers, using the best of XLA & MLIR.

Objective

OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.

SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA.

Goals

Accelerate industry collaboration around XLA and build a vibrant OSS community.
Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors.
Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent.
Ensure the extraction of XLA from TensorFlow is minimally disruptive to existing users and contributors.
Create a product identity with its own brand, website, docs, and communication channels.
Discuss establishment of governance outside TensorFlow.

We propose to create SIG OpenXLA to facilitate development of an open, state-of-art ML compiler, built collaboratively with ML hardware & framework developers, using the best of XLA & MLIR. ## Objective OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors. SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA. ## Goals * Accelerate industry collaboration around XLA and build a vibrant OSS community. * Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors. * Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent. * Ensure the extraction of XLA from TensorFlow is minimally disruptive to existing users and contributors. * Create a product identity with its own brand, website, docs, and communication channels. * Discuss establishment of governance outside TensorFlow.

bhack · 2022-07-20T22:36:42Z

rfcs/20220713-sig-open-xla.md

+
+* Accelerate industry collaboration around XLA and build a vibrant OSS community.
+* Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors.
+* Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent. 


I suggest to evaluate the pros and cons of using an independent Github org also related to the Keras migration experience.

One of the main issue:

No ticket migration between different Github orgs

What is this all bout

bhack · 2022-07-20T22:40:42Z

It could be nice if the new SIGs like this one could adopt and eventually improve then README.md and CONTIRBUTING.md templates.

bhack · 2022-07-20T23:21:05Z

rfcs/20220713-sig-open-xla.md

+
+OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.
+
+SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA. 


Do you have already an idea of what TF folders will be involved in this process?

Also I hope that we are not going to just mirror folders in TF as with the MHLO repo and we could clearly isolate the components.

The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM.
Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).

The folders involved are:

tensorflow/compiler/xla -> will be the new OpenXLA repository root

tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split

A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).

Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.

The folders involved are:
....

I've recently contributed to TF2XLA with many frictions between OSS and the internal infra.
As this folder is not included in your list are these contributions still going to be done in the TF main repo?

I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.

Yes absolutely: this is just an entire different track of work with a different motivation than what motivates OpenXLA right now.
Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?

I've recently contributed to TF2XLA with tensorflow/build#122 between OSS and the internal infra.

Ouch... these kind of difference between Bazel and the internal Google checks seems really annoying, we should be able to align this though?

As this folder is not included in your list are these contributions still going to be done in the TF main repo?

OpenXLA won't have any dependency on TensorFlow, so the TF/XLA bridge will naturally continue to be part of TensorFlow moving forward.
(regardless of where the code go: the kind of problem you refer to will exist and we should address them!)

Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?

This really depend on your vision about the productization roadmap.
If TF master/nightly will rely on OpenXLA "rolling sha" commits that will rely on LLVM "rolling sha" commits and so we are really not relying on releases, API versioning, etc.. I think it will be really a weak modularization and not so much something that could improve the current status quo.

Some positive side effects could be retreived by disentangling the targets deps graph:
#238

But I think that the main impact is still realated to the OpenXLA own roadmap/vision.

The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM. Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).

The folders involved are:

tensorflow/compiler/xla -> will be the new OpenXLA repository root

tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split

A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).

Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?

I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?

Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?

I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?

Mostly yes: HLO isn't gonna go away anytime soon, but for the current targets publicly supported by XLA (CPU/GPU) we're pledging to use MLIR (and MHLO) end-to-end on the long term and be the preferred way to add new high-level optimizations to XLA. We're also planning to continue developing most of the codegen inside MLIR/LLVM itself (Linalg in particular) and use it inside XLA. This offers opportunities to share large part of it with other projects like IREE for example.

tdb-alcorn · 2022-07-22T23:39:56Z

Excited to see this develop!

sanjoy · 2022-07-25T19:13:55Z

Super exciting!

Will OpenXLA be under open governance (i.e. similar to the LLVM model)? Or will it be governed under the TensorFlow / Google umbrella?

joker-eph · 2022-07-25T19:24:35Z

We touched on this in the RFC, see this section: https://github.com/tensorflow/community/blob/master/rfcs/20220713-sig-open-xla.md#collaboration--governance

We aim to evolve toward a model as-open-as-LLVM in terms of governance. It’ll be a gradual process and we want to consult with the members/contributors to help us define a good governance for the project. This will be an important aspect of the SIG.

bhack · 2022-07-25T19:36:41Z

@sanjoy Other then this, another governance point that was discussed was the related sub-governance of MHLO:
llvm/torch-mlir#999 (comment)

burmako · 2022-07-25T21:05:37Z

As far as MHLO goes, we've been internally working on something called StableHLO - a version of HLO/MHLO that will provide stability guarantees, a specification, a test suite and a reference implementation.

In the near future, StableHLO will be switching to GitHub-first development process - the code will be developed via pull requests, there will be a GitHub-based test suite, GitHub Issues will be used to track the work, and GitHub Discussions / Discord will be used for discussions. We're in the final stages of approvals for all this, and I expect that we'll be able to tell (and show) more shortly.

The overall goal for StableHLO is to create a community to build an amazing portability layer between ML frameworks and ML compilers. HLO/MHLO provide a good foundation, but there are a lot of good ideas beyond that, and I can't wait to start working this all out together.

tanyokwok · 2022-07-26T02:06:34Z

As far as MHLO goes, we've been internally working on something called StableHLO - a version of HLO/MHLO that will provide stability guarantees, a specification, a test suite and a reference implementation.

In the near future, StableHLO will be switching to GitHub-first development process - the code will be developed via pull requests, there will be a GitHub-based test suite, GitHub Issues will be used to track the work, and GitHub Discussions / Discord will be used for discussions. We're in the final stages of approvals for all this, and I expect that we'll be able to tell (and show) more shortly.

The overall goal for StableHLO is to create a community to build an amazing portability layer between ML frameworks and ML compilers. HLO/MHLO provide a good foundation, but there are a lot of good ideas beyond that, and I can't wait to start working this all out together.

Then what is the relationship between OpenXLA and StableHLO? @burmako @joker-eph @theadactyl

JamesTheZ · 2022-07-26T02:47:37Z

What about JAX? The XLA part will also be extracted out?

burmako · 2022-07-26T03:07:12Z

@fortianyou "Then what is the relationship between OpenXLA and StableHLO?". There is a plan for StableHLO to be used as input for XLA, and StableHLO has its roots in HLO which comes from XLA, so I expect that OpenXLA and StableHLO will have a close relationship.

That said, our goal with StableHLO is to build a portability layer between ML frameworks and ML compilers, which means that we will avoid coupling StableHLO with particular compilers, e.g. XLA, so that other compilers could pick it up as well if they are interested.

As we bootstrap StableHLO in the near future, we'll be reviewing which parts of HLO/MHLO can become part of StableHLO right away and which parts are XLA-specific (and should stay internal to XLA or should be generalized before being included in StableHLO).

E.g. should ops like mhlo.fusion be in StableHLO? What about advanced functionality like bounded dynamism - should we make it part of the compiler interface or that should be an implementation detail of XLA? We have done an internal review, so we have some thoughts on all this already, but the whole point of StableHLO is to build a community, so let's discuss together! (Let's just wait until StableHLO is opensourced, which I expect to happen by next week at the latest).

We believe that OpenXLA will be a great forum for these discussions, so we decided that we will be opensourcing StableHLO under OpenXLA's GitHub organization and will be using OpenXLA's Discord server to chat about StableHLO. Hopefully, this answers your question!

penpornk · 2022-07-28T20:51:39Z

@wchao1115 FYI.

joker-eph · 2022-08-03T19:51:30Z

Just to follow up, feel free to subscribe to this repo: https://github.com/openxla/community

We're using GitHub discussions right now, see here the announcement for the first public meeting (next Tuesday): openxla/community#5

theadactyl requested a review from ematejska as a code owner July 13, 2022 20:35

theadactyl changed the title ~~Establish SIG OpenXLA~~ RFC: Establish SIG OpenXLA Jul 13, 2022

theadactyl added 2 commits July 13, 2022 13:38

Update 20220713-sig-open-xla.md

9111e99

Add Apple to list of participating orgs

ac2d66f

bhack reviewed Jul 20, 2022

View reviewed changes

Update 20220713-sig-open-xla.md

7d11690

ematejska approved these changes Jul 25, 2022

View reviewed changes

ematejska merged commit 1c34d56 into master Jul 25, 2022

mihaimaruseac deleted the theadactyl-patch-11 branch July 25, 2022 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Establish SIG OpenXLA #419

RFC: Establish SIG OpenXLA #419

theadactyl commented Jul 13, 2022 •

edited by ematejska

Loading

bhack Jul 20, 2022

Toddnladybug1234 Jul 25, 2022

bhack commented Jul 20, 2022

bhack Jul 20, 2022

bhack Jul 20, 2022

joker-eph Jul 21, 2022 •

edited

Loading

bhack Jul 21, 2022

joker-eph Jul 21, 2022

bhack Jul 21, 2022 •

edited

Loading

wyzero Jul 26, 2022

joker-eph Jul 26, 2022

tdb-alcorn commented Jul 22, 2022

sanjoy commented Jul 25, 2022

joker-eph commented Jul 25, 2022

bhack commented Jul 25, 2022

burmako commented Jul 25, 2022

tanyokwok commented Jul 26, 2022

JamesTheZ commented Jul 26, 2022

burmako commented Jul 26, 2022 •

edited

Loading

penpornk commented Jul 28, 2022

joker-eph commented Aug 3, 2022

RFC: Establish SIG OpenXLA #419

RFC: Establish SIG OpenXLA #419

Conversation

theadactyl commented Jul 13, 2022 • edited by ematejska Loading

RFC: Establish SIG OpenXLA

Objective

Goals

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhack commented Jul 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joker-eph Jul 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhack Jul 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdb-alcorn commented Jul 22, 2022

sanjoy commented Jul 25, 2022

joker-eph commented Jul 25, 2022

bhack commented Jul 25, 2022

burmako commented Jul 25, 2022

tanyokwok commented Jul 26, 2022

JamesTheZ commented Jul 26, 2022

burmako commented Jul 26, 2022 • edited Loading

penpornk commented Jul 28, 2022

joker-eph commented Aug 3, 2022

theadactyl commented Jul 13, 2022 •

edited by ematejska

Loading

joker-eph Jul 21, 2022 •

edited

Loading

bhack Jul 21, 2022 •

edited

Loading

burmako commented Jul 26, 2022 •

edited

Loading