Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-0031: ZenDNN Integration #52

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions RFC-0031-ZenDNN-Integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@

# ZenDNN Integration


## **Author**
* @naveenthangudu


## **Summary**
This document proposes an approach for integrating ZenDNN library into PyTorch. This integration will enable inference optimizations for deep learning workloads on AMD CPUs.


## **Highlights**
* Default build behavior remains un-altered.
* A pre-build step will generate ZenDNN integration code.
* ZenDNN integration is optimized for inference workloads.
* New opt-in build flag (environmental variable) `USE_ZENDNN` will be added.
* No Submodules are added. When `USE_ZENDNN` is set to 1, the following repositories will be downloaded into third_party folder.
* [ZenDNN](https://github.com/amd/ZenDNN)
* [AOCL BLIS](https://github.com/amd/blis)
* ZenDNN optimizations are added as JIT Passes on frozen graph under `optimize_for_inference` API.
* Support will be added for fp32 and bf16 data types.


## **Motivation**
The main goal of this integration is to realize optimal inference on AMD CPUs for PyTorch. ZenDNN is a library which enables performance improvements on AMD CPU architectures. Please find the repo and more details [here](https://github.com/amd/ZenDNN). To highlight the potential uplift using ZenDNN vs. oneDNN on AMD CPUs, we have benchmarked our fully integrated ZenDNN version using PyTorch v1.12 on a Genoa CPU (4th Generation AMD EPYC™ Processors).


### **Benchmarking Configuration**
|||
|-----------------------------|-------|
| PyTorch version | v1.12 |
| CPU | 4th Generation AMD EPYC™ Genoa CPU |
| Number of physical cores | 96 |
| NUMA Nodes per Socket(NPS) | 1 |
| Model data type | FP32 |

### **Latency Performance - batch_size=1**

<img src="./RFC-0031-assets/Latency.svg" width=40%>

### **Throughput Performance - batch_size=640**

<img src="./RFC-0031-assets/Throughput.svg" width=40%>


## **Proposed Implementation**

### **Overview**
This approach references the [hipify](https://github.com/ROCm-Developer-Tools/HIPIFY) approach used by the AMD ROCm team for integration of ROCm libraries into PyTorch. By leveraging shared APIs between oneDNN and ZenDNN, a pre-build step will generate ZenDNN integration code wherever possible. By adding an opt-in build flag, default build behavior is un-altered. Required repositories are only downloaded when the build flag is enabled.
naveenthangudu marked this conversation as resolved.
Show resolved Hide resolved
naveenthangudu marked this conversation as resolved.
Show resolved Hide resolved

### **Code generation script**
* References hipify tool, which is present in PyTorch.
* Implemented in Python and run as a pre-build step.
* Takes in folders or files and generates substituted code.

### **Build Infrastructure**
* Support for opt-in flag (environmental variable) `USE_ZENDNN` in Linux build path.
* PyTorch build reverts to default behavior if `USE_ZENDNN` environmental
variable is set to zero or unset.
* When `USE_ZENDNN` is set to 1,
* ZenDNN and AOCL BLIS repositories will be downloaded into third_party folder.
* AOCL BLIS and ZenDNN will be built into PyTorch.
* oneDNN path will be disabled.
* BLIS library is built as a pure dependency for ZenDNN acceleration. MKL will continue to be the BLAS library used by default.


### **Graph Optimizations**
As part of this integration, the following optimizations will be added into PyTorch. As optimization is an ongoing process, we will be adding more optimizations in the future.
* Fusion of ops. Examples include,
* Conv Bias and Relu (CBR) fusions
* Conv and Add fusions (Fusing Residual addition into conv)
* Inplace concatenation (Inception)
* Reorder optimizations
* Inplace unary elementwise operations
naveenthangudu marked this conversation as resolved.
Show resolved Hide resolved
* Memory Pool optimizations

As a few optimizations are specific to inference only workloads, our graph optimizations are added onto frozen graphs. They are added under `optmize_for_inference` API. The figure below illustrates ZenDNN graph optimizations in the PyTorch workflow.

![Graph_Optimizations](./RFC-0031-assets\graph_optimizations.svg)

## **Open Questions**
The following questions will impact the design and implementation of ZenDNN integration.

* Can ZenDNN have a new dispatch key added specifically for it?
* Can ZenDNN have its own backend?
* Can we add ZenDNN tensor layout to tensor layouts?


## **Next Steps**
* First PR - with code generation tool and build infrastructural changes with support for eager mode.
* Second PR - featuring a few graph optimizations.
* Third PR - including unit tests and CI deployment.
* Further PRs with more optimizations and PyTorch 2.0 feature integration.
1 change: 1 addition & 0 deletions RFC-0031-assets/Latency.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions RFC-0031-assets/Throughput.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading