Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Add spec to get device image backend content #14811

Open
wants to merge 10 commits into
base: sycl
Choose a base branch
from

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Jul 26, 2024

Add a proposed specification for an extension that returns the backend content of a device image.

Add a proposed specification for an extension that returns the content
of a device image.
@gmlueck gmlueck requested a review from a team as a code owner July 26, 2024 20:20

:ref2: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc

A kernel bundle can contain multiple device images with different

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to get information about device for a given device image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly. You can indirectly ask if a device image is compatible with a device via device_image::has_kernel(kernel_id, device). There are a couple reasons why it's hard to provide a direct query:

  • For device images that are not in executable state, the device image could be compatible with many different devices. For example, a SPIR-V module can be compiled for many different devices, including future devices that don't even exist yet.

  • Even for executable device images, the image could be compatible with many different devices. For example, I think this is the case for Nvidia CUBIN modules.

I think we could add a query like device_image::is_compatible(device) if that would help. This would be similar to device_image::has_kernel(kernel_id, device), except it asks about the device image a whole, rather than a specific kernel.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is device_image::has_kernel(kernel_id, device), then I think it is fine.

sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {iota});

std::vector<std::byte> bytes;
for (auto& img: bundle) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education, are there any cases where there will be more than 1 image returned for this case? Except the multi-device case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question! For DPC++, this will be the case (at least for now).

@AerialMantis proposed the initial draft for the kernel bundle APIs, and he originally thought there could be multiple device images for a single device. I think the idea was that the compiler could create multiple versions that are optimized for different scenarios. I think this was just a hypothetical use case, though. DPC++ does not currently do this. If we did add this in the future, we would need some sort of query to distinguish between them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. Are you asking if a kernel_bundle could contain more than one device image when the system has just one device? Yes, that can definitely happen. Consider, for example, a case where you ask for a bundle that contains two kernels:

sycl::kernel_id foo = syclexp::get_kernel_id<foo>();
sycl::kernel_id bar = syclexp::get_kernel_id<bar>();
auto exe_bndl =
    sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {foo, bar});

There is no guarantee that foo and bar are in the same device image, so the kernel bundle might contain two device images in this case. The compiler decides how to pack kernels together into device images depending on the compilation flags (e.g. -fsycl-device-code-split) and also some internal algorithms.

Copy link

@bsochack bsochack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {iota});

std::vector<std::byte> bytes;
for (auto& img: bundle) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. Are you asking if a kernel_bundle could contain more than one device image when the system has just one device? Yes, that can definitely happen. Consider, for example, a case where you ask for a bundle that contains two kernels:

sycl::kernel_id foo = syclexp::get_kernel_id<foo>();
sycl::kernel_id bar = syclexp::get_kernel_id<bar>();
auto exe_bndl =
    sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {foo, bar});

There is no guarantee that foo and bar are in the same device image, so the kernel bundle might contain two device images in this case. The compiler decides how to pack kernels together into device images depending on the compilation flags (e.g. -fsycl-device-code-split) and also some internal algorithms.

* `bundle_state::input`: The device image content is a PTX module representing
one or more kernels.

* `bundle_state::object`: ???
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AerialMantis: Can you help with this part? What is the device image content for an object state bundle on CUDA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no meaningful state for the cuda and HIP backend, they each ignore transition to this state and jump to the last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds the same as as with Level Zero. Can we say:

The same format as bundle_state::input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to restrict this extension to only bundle_state::executable, so I think the issue here is now moot.

A couple changes to simplify and clarify this extension:

* Rename the functions to make it clear they get the "backend" content
  of the device image.  This will avoid confusion with a new member
  function `kernel_bundle::ext_oneapi_get_content` that we plan to add
  in a separate extension.

* Constrain the functions, so they are only available when the kernel
  bundle is `bundle_state::executable`.  This is the only case we need
  now, and it avoids the need to define what is returned in the other
  states.
@gmlueck gmlueck changed the title [SYCL][Doc] Add spec to get device image content [SYCL][Doc] Add spec to get device image backend content Nov 25, 2024
@gmlueck
Copy link
Contributor Author

gmlueck commented Nov 27, 2024

@Pennycook I added you as a reviewer since you volunteered in chat to look at this. Thanks!

----
!====

Available only when the compiler is {cpp}20 or higher.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this, and rely on the ifdef in the synopsis. This is really saying "Available only when the library provides an implementation of std::span", and that's implied by usage of the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to agree on standard wording for cases like this because I think it will come up more and more.

I feel like we should say somewhere that this API is tied to C++20. How do you feel about the wording I added in d288119?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have said this is good enough, but if you'd like to settle on standard wording I'd like to talk about it a bit more.

I think it would be more accurate to say that the implementation defines the macro. ISO C++ says the implementation only defines the macro if you include <version> or <span>.

Given the above, do you want to say anything about who is responsible for including <span> somewhere? It probably shouldn't be part of the function synopsis, but we should make it clear that if this extension is implemented then sycl.hpp is expected to include <span> if it's available (see __has_include), and that this isn't a user's responsibility. For the actual SYCL specification, I imagine we'd have a single dedicated section to explain how compilation with different versions of C++ works, and we could cover this there instead of in each extension.

Finally, I think we could improve the formatting. We should add a title to the paragraph (like we do with Constraints, etc) and consider rephrasing the "Available only when" (to further differentiate from constraints). How about something like:

Required C++ Features: The implementation must define the __cpp_lib_span feature-test macro (which is defined in C++20 and higher).

I deliberately put "C++" in the title in an attempt to be future-proof. Even if ISO C++ decided to introduce a similar concept with the same name, it would probably just be called "Required Features" (since the C++ would be implied).

gmlueck and others added 5 commits November 27, 2024 12:31
Two changes that make it easier to use `device_image` by code that
does not also have the `kernel_bundle` object:

* Add an API to `device_image` that returns the backend.
* Change the lifetime of the content view to be the lifetime of the
  `device_image` object instead of the containing `kernel_bundle` object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants