Skip to content

Latest commit

 

History

History
48 lines (30 loc) · 3.23 KB

README.md

File metadata and controls

48 lines (30 loc) · 3.23 KB

Custom Operator Registration Examples

This folder contains examples to register custom operators into PyTorch as well as register its kernels into ExecuTorch runtime.

How to run

Prerequisite: finish the setting up wiki.

Run:

cd executorch
bash examples/portable/custom_ops/test_custom_ops.sh

AOT registration

In order to use custom ops in ExecuTorch AOT flow (EXIR), the first option is to register the custom ops into PyTorch JIT runtime using torch.library APIs.

We can see the example in custom_ops_1.py where we try to register my_ops::mul3 and my_ops::mul3_out. my_ops is the namespace and it will show up in the way we use the operator like torch.ops.my_ops.mul3.default. For more information about PyTorch operator, checkout pytorch/torch/_ops.py.

Notice that we need both functional variant and out variant for custom ops, because EXIR will need to perform memory planning on the out variant my_ops::mul3_out.

The second option is to register the custom ops into PyTorch JIT runtime using C++ APIs (TORCH_LIBRARY/TORCH_LIBRARY_IMPL). This also means we need to write C++ code and it needs to depend on libtorch.

We added an example in custom_ops_2.cpp where we implement and register my_ops::mul4, also custom_ops_2_out.cpp with an implementation for my_ops::mul4_out.

By linking them both with libtorch and executorch library, we can build a shared library libcustom_ops_aot_lib_2 that can be dynamically loaded by Python environment and then register these ops into PyTorch. This is done by torch.ops.load_library(<path_to_libcustom_ops_aot_lib_2>) in custom_ops_2.py.

C++ kernel registration

After the model is exported by EXIR, we need C++ implementations of these custom ops in order to run it. For example, custom_ops_1_out.cpp is a C++ kernel that can be plugged into the ExecuTorch runtime. Other than that, we also need a way to bind the PyTorch op to this kernel. This binding is specified in custom_ops.yaml:

- func: my_ops::mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)
  kernels:
    - arg_meta: null
      kernel_name: custom::mul3_out_impl # sub-namespace native:: is auto-added

For how to write these YAML entries, please refer to kernels/portable/README.md.

Currently we use Cmake as the build system to link the my_ops::mul3.out kernel (written in custom_ops_1.cpp) to the ExecuTorch runtime. See instructions in: examples/portable/custom_ops/test_custom_ops.sh (test_cmake_custom_op_1).

Selective build

Note that we have defined a custom op for both my_ops::mul3.out and my_ops::mul4.out in custom_ops.yaml. To reduce binary size, we can choose to only register the operators used in the model. This is done by passing in a list of operators to the gen_oplist custom rule, for example: --root_ops="my_ops::mul4.out".

We then let the custom ops library depend on this target, to only register the ops we want.

For more information about selective build, please refer to selective_build.md.