Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Establish SIG OpenXLA #419

Merged
merged 4 commits into from
Jul 25, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions rfcs/20220713-sig-open-xla.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# RFC: Establish SIG OpenXLA

| Status | Accepted |
:-------------- |:---------------------------------------------------- |
| **RFC #** | [419](https://github.com/tensorflow/community/pull/419)|
| **Author(s)** | Thea Lamkin ([email protected]), Mehdi Amini ([email protected]) |
| **Sponsor** | Thea Lamkin ([email protected]) |
| **Updated** | 2022-07-13 |

## Objective

OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.

SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have already an idea of what TF folders will be involved in this process?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I hope that we are not going to just mirror folders in TF as with the MHLO repo and we could clearly isolate the components.

Copy link
Contributor

@joker-eph joker-eph Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM.
Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).

The folders involved are:

  • tensorflow/compiler/xla -> will be the new OpenXLA repository root
  • tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split
  • A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.

The folders involved are:
....

I've recently contributed to TF2XLA with many frictions between OSS and the internal infra.
As this folder is not included in your list are these contributions still going to be done in the TF main repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.

Yes absolutely: this is just an entire different track of work with a different motivation than what motivates OpenXLA right now.
Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?

I've recently contributed to TF2XLA with tensorflow/build#122 between OSS and the internal infra.

Ouch... these kind of difference between Bazel and the internal Google checks seems really annoying, we should be able to align this though?

As this folder is not included in your list are these contributions still going to be done in the TF main repo?

OpenXLA won't have any dependency on TensorFlow, so the TF/XLA bridge will naturally continue to be part of TensorFlow moving forward.
(regardless of where the code go: the kind of problem you refer to will exist and we should address them!)

Copy link
Contributor

@bhack bhack Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?

This really depend on your vision about the productization roadmap.
If TF master/nightly will rely on OpenXLA "rolling sha" commits that will rely on LLVM "rolling sha" commits and so we are really not relying on releases, API versioning, etc.. I think it will be really a weak modularization and not so much something that could improve the current status quo.

Some positive side effects could be retreived by disentangling the targets deps graph:
#238

But I think that the main impact is still realated to the OpenXLA own roadmap/vision.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM. Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).

The folders involved are:

  • tensorflow/compiler/xla -> will be the new OpenXLA repository root
  • tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split
  • A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).

Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?

I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?

I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?

Mostly yes: HLO isn't gonna go away anytime soon, but for the current targets publicly supported by XLA (CPU/GPU) we're pledging to use MLIR (and MHLO) end-to-end on the long term and be the preferred way to add new high-level optimizations to XLA. We're also planning to continue developing most of the codegen inside MLIR/LLVM itself (Linalg in particular) and use it inside XLA. This offers opportunities to share large part of it with other projects like IREE for example.


## SIG Charter

### Goals

* Accelerate industry collaboration around XLA and build a vibrant OSS community.
* Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors.
* Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to evaluate the pros and cons of using an independent Github org also related to the Keras migration experience.

One of the main issue:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this all bout

* Ensure the extraction of XLA from TensorFlow is minimally disruptive to existing users and contributors.
* Create a product identity with its own brand, website, docs, and communication channels.
* Discuss establishment of governance outside TensorFlow.

### Membership

Everyone involved in developing or integrating with XLA is welcome to participate in discussions. To participate, members can request an invitation to join the GitHub organization [TBA] and SIG Discord [TBD].

Creating a successful OpenXLA project will also benefit from a collaborative effort from key representatives from ML frameworks, hardware platforms, users and integrators. The following organizations have agreed to participate in SIG discussions and provide resources allowing the SIG to reach its goals (in alphabetical order):

* AMD
* Apple
* ARM
* AWS
* Google (TensorFlow, JAX, Pytorch/XLA)
* Intel
* Meta (Pytorch)
* NVIDIA

Individual SIG members will be added via PR in the following directory [TBA]. Members are expected to regularly attend meetings, participate in technical discussions, and make regular technical contributions to the project.

### Communication

SIG OpenXLA will hold at minimum monthly virtual meetings for roadmap sharing, design discussion, and SIG coordination. Agendas will be open to contribution and shared in advance. Meeting minutes will be shared to the SIG [mailing list].
Asynchronous communications will happen in GitHub Discussions in the OpenXLA GitHub organization (until it's possible to migrate to an independent Discourse forum), including design proposals and roadmap update announcements.

### Collaboration & Governance

**Future Governance**

An explicit workstream within the SIG in 2023 will be to establish collaboration principles, code review processes, and community infrastructure for OpenXLA when the XLA project moves out of the TensorFlow organization. Discussions to prepare for this work will begin in 2022.

The SIG aims to establish an open governance model drawing from standards such as LLVM, with particular emphasis on open design/roadmap discussions, public process for gaining technical steering rights, and neutral docs & repo governance (eg location, CLA, etc). repo location.
Near-term Governance

Here we define near-term avenues for collaboration & governance given the current location of XLA in TensorFlow. SIG OpenXLA will be launched under the TensorFlow governance umbrella, and leverage existing TensorFlow community infrastructure to more efficiently bootstrap collaboration.

**Code**

Code contribution to XLA in its current location will be released under the Apache 2.0 license, governed by TensorFlow’s collaboration rules and contributed under the Google CLA, where the initial maintainers will be the existing maintainers of XLA.

**Design Reviews & Roadmap**

Once launched, SIG OpenXLA will immediately begin technical design conversations with publicly available archives. All significant design proposals and reviews will use a public proposal process established by the SIG (or the TensorFlow RFC process for changes impacting tensorflow/compiler/XLA.

**Technical Governance**

A priority of the SIG will be to establish a path for community members to take on technical leadership roles in the project. During the bootstrapping phase of the project in 2022, Google engineers will assume responsibility for the technical leadership of the project.

### Contacts
* For technical questions, contact Mehdi Amini - aminim at google
* For administrative questions, contact Thea Lamkin - thealamkin at google
### Resources
* GitHub ((current)[https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/xla])
* Discord TBA
* Community proposals TBA
* Community meetings TBA

### Code of Conduct
While under TensorFlow governance, all community spaces for SIG OpenXLA are subject to the [TensorFlow Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).