[Specification] oneMKL lapack to allow asynchronous functions #589

JackAKirk · 2024-10-11T10:58:34Z

Summary

Linear algebra operators in oneMKL lapack that return computation error (e.g. for matrix operations such as inversion (e.g. getri) that may not have a solution) return this error via an exception ([oneapi::mkl::lapack::computation_error](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemkl/source/architecture/architecture#onemkl-lapack-exception-computation-error)). To achieve this there is a implementation constraint that such functions as getri are synchronous, since they generally don't know this error code until completion. This means that even if (for example) a programmer inputs a matrix that does have a valid solution for the given operation (e.g. a matrix that is non-singular for an inverse operation), the user is forced to have all work wait on the return of this synchronous operation to check for an error code that is irrelevant. This affects are large proportion (maybe most?) of oneMKL lapacks most computationally intensive functions. Any workload using these functions will be severely bottlenecked with respect to asynchronous performance.

However native libraries such as cusolver (that oneMKL uses), can return this "computation error" information via a return value that is returned asynchronously. Therefore a change to the oneMKL specification would fix this issue.

Problem statement

Provide asynchronous oneMKL interfaces for Linear algebra operators that currently return "computation error" exceptions.

Details

oneMKL will need to remove the [oneapi::mkl::lapack::computation_error](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemkl/source/architecture/architecture#onemkl-lapack-exception-computation-error) exception, and replace it with either:

Probably the only sensible solution: an extra parameter for each function that returns such an exception, that instead returns "SomethingInfo" asynchronously, that provides this computational error info: mapping one to one with e.g. cusolver.
Some kind of solution with SYCL asynchronous exceptions: I'm not sure if this is possible but could be looked into. AFAIK currently sycl asynchronous exceptions are completely unused.

The text was updated successfully, but these errors were encountered:

ericlars · 2024-10-11T18:03:04Z

Hi @JackAKirk, thanks for the RFC.

The oneMKL LAPACK team has had an ongoing discussion on the issue you raise which I'll summarize here. We agree with your assessment of the blocking nature of using exceptions for computation errors and find it entirely reasonable to replace them with info variables (or arrays in the batch case).

Some kind of solution with SYCL asynchronous exceptions: I'm not sure if this is possible but could be looked into. AFAIK currently sycl asynchronous exceptions are completely unused.

SYCL does not allow exceptions to be thrown in kernel scope, we're only aware of the possibility to throw asynchronous exceptions from host_tasks which limits their usefulness.

Provide asynchronous oneMKL interfaces for Linear algebra operators that currently return "computation error" exceptions

Exception handling of computation errors is not the only blocker for asynchronous behavior. As we understand it, SYCL provides host_task for scheduling CPU tasks with device tasks. A limitation of host_task is that it is undefined behavior to capture queues or events, so even if a kernel updates an info variable it is not possible to asynchronously schedule a task conditioned on the outcome of a prior kernel within the SYCL framework.

Furthermore, several oneMKL LAPACK functions do not lend themselves to performant GPU-only implementations and so perform some critical sections on the CPU. While the GPU portions are bound to the context provided by the SYCL queue, the CPU portions generally assume they have unfettered access to CPU resources. For these routines the benefit of asynchronicity is unclear to us.

JackAKirk · 2024-10-11T18:09:18Z

Thanks for the quick reply!

Exception handling of computation errors is not the only blocker for asynchronous behavior. As we understand it, SYCL provides host_task for scheduling CPU tasks with device tasks. A limitation of host_task is that it is undefined behavior to capture queues or events, so even if a kernel updates an info variable it is not possible to asynchronously schedule a task conditioned on the outcome of a prior kernel within the SYCL framework.

oneMKL is a library and does not have to use only the existing sycl 2020 specification. In fact we have already solved this issue for the two backends that it affects via the enqueue_native_command dpc++ extension: please see #572. As I understand it this completely resolves the issue you raise here.

Furthermore, several oneMKL LAPACK functions do not lend themselves to performant GPU-only implementations and so perform some critical sections on the CPU. While the GPU portions are bound to the context provided by the SYCL queue, the CPU portions generally assume they have unfettered access to CPU resources. For these routines the benefit of asynchronicity is unclear to us.

Sure I understand that certain functions (and/or certain backends) may not be able to take advantage of this. However the cusolver and rocsolver backends have a large number of functions to which such limitations do not currently exist; it also sounds like intel backends at least have a few cases that could take advantage of such an improved interface? And I expect that future generations of intel implementations will improve on this current situations?.

ericlars · 2024-10-11T22:11:46Z

Glad to hear the host_task issues have been worked around, if at least for some backends. We support this change; do you plan on driving the spec update over on https://github.com/uxlfoundation/oneAPI-spec?

JackAKirk · 2024-10-14T09:55:48Z

Glad to hear the host_task issues have been worked around, if at least for some backends. We support this change; do you plan on driving the spec update over on https://github.com/uxlfoundation/oneAPI-spec?

@Ruyk could I work on this? these linear algebra operators are used in pytorch and already they are hooked up to intel python's numpy implementation: https://github.com/IntelPython/dpnp

Rbiessy · 2024-10-16T13:49:28Z

Thanks for the issue Jack. We won't have time to work on this at Codeplay but external contributions are welcomed to improve this!

JackAKirk added the RFC A proposal to add new API label Oct 11, 2024

Rbiessy added help wanted Tasks, issues or features that could be implemented and contributed to the project and removed RFC A proposal to add new API labels Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Specification] oneMKL lapack to allow asynchronous functions #589

[Specification] oneMKL lapack to allow asynchronous functions #589

JackAKirk commented Oct 11, 2024

ericlars commented Oct 11, 2024

JackAKirk commented Oct 11, 2024 •

edited

Loading

ericlars commented Oct 11, 2024

JackAKirk commented Oct 14, 2024

Rbiessy commented Oct 16, 2024

[Specification] oneMKL lapack to allow asynchronous functions #589

[Specification] oneMKL lapack to allow asynchronous functions #589

Comments

JackAKirk commented Oct 11, 2024

Summary

Problem statement

Details

ericlars commented Oct 11, 2024

JackAKirk commented Oct 11, 2024 • edited Loading

ericlars commented Oct 11, 2024

JackAKirk commented Oct 14, 2024

Rbiessy commented Oct 16, 2024

JackAKirk commented Oct 11, 2024 •

edited

Loading