Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Various C++ Documentation Examples to Current Interface #398

Merged
merged 6 commits into from
Oct 18, 2024

Conversation

KeithJH
Copy link
Contributor

@KeithJH KeithJH commented Oct 18, 2024

Updated various C++ documentation examples and TestBenchmark to code that will compile. This largely is updating references of kp::OpTensorSync{Device,Local} to kp::OpSync{Device,Local} and updating algorithms to use kp::Memory instead of kp::Tensor as these changes were introduced in #388.

Examples from documentation were tested by copying into a local C++ project setup to use Kompute and compile without any modification to the example code.

TestBenchmark was tested by compiling Kompute with KOMPUTE_OPT_ENABLE_BENCHMARK="ON" and ensuring the produced binary runs.

Updated so the example compiles in a test project.

With the addition of an Image class various things were reorganized,
including algorithm using kp::Memory instead of just kp::Tensor and
kp::OpTensorSync* operations renamed to kp::OpSync*.

Signed-off-by: Keith Horrocks <[email protected]>
With the addition of an Image class various things were reorganized,
including algorithm using kp::Memory instead of just kp::Tensor and
kp::OpTensorSync* operations renamed to kp::OpSync*.

Tested building with KOMPUTE_OPT_ENABLE_BENCHMARK="ON" and verifying
resulting binary runs.

Signed-off-by: Keith Horrocks <[email protected]>
More renames from kp::OpTensorSync* to kp::OpSync*

Example now compiles in a test project

Signed-off-by: Keith Horrocks <[email protected]>
Various corrections, including renaming kp::OpTensorSync*
operations to kp::OpSync*.

Example now compiles in a test project.

Signed-off-by: Keith Horrocks <[email protected]>
Various corrections, including renaming kp::OpTensorSync*
operations to kp::OpSync*.

Example now compiles when pieced together in a test project.

Signed-off-by: Keith Horrocks <[email protected]>
Various corrections, including renaming kp::OpTensorSync*
operations to kp::OpSync*.

Example now compiles when pieced together in a test project.

Signed-off-by: Keith Horrocks <[email protected]>

// Run the second parallel operation in the `queueTwo` sequence
sqTwo->evalAsync<kp::OpAlgoDispatch>(algo);
sqTwo->evalAsync<kp::OpAlgoDispatch>(algoTwo);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to signify running the same shader but with a different tensor bound for this example other than creating another Algorithm? The code before my change just ran both sequences updating tensorA and tensorB was never updated. The below print appears to expect both updated, which makes sense for a more useful example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow, could you provide an example of what you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axsaucedo Anything that more closely resembles what the code was previously trying to do, with reuse of algo instead of creating two separate algorithms. I've not looked into the backend, but this could be to avoid any overhead around spirv being duplicated (like making sure it's loaded/ready on device).

I'm going to guess that since there wasn't an immediate suggestion to change here that there is no concern in having the separate algorithms and we should continue as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a poor example as it doesn't make sense with how everything is laid out, but I could imagine a world where you would see something similar to:

sqOne->evalAsync<kp::OpAlgoDispatch>(algo, tensorA);
sqTwo->evalAsync<kp::OpAlgoDispatch>(algo, tensorB);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean now - yes this would actually be much desirable. When designing kompute this was initial aim as well, unfortunately due to the design of the underlying vulkan architecture it's not possible (namely due to the dependency between the descriptor sets, the algorithm and the tensors) which make it such that initialisation is coupled. This is something that I hope at some point is addressed in the design of Vulkan, but at least in the medium term this doesn't seem to be planned. Hope this provides further context.

@KeithJH
Copy link
Contributor Author

KeithJH commented Oct 18, 2024

There are more potential changes, but this PR covers the ones I felt comfortable validating with my current development environment. Presumably changes are also necessary for the below. I can create issues for these to be handled separately.

Python Examples:

PS kompute-fork> ls -r docs\overview\python* | select-string OpTensorSync

docs\overview\python-examples.rst:17:   from kp import Manager, Tensor, OpTensorSyncDevice, OpTensorSyncLocal, OpAlgoDispatch
docs\overview\python-examples.rst:29:   sq.eval(OpTensorSyncDevice([tensor_in_a, tensor_in_b, tensor_out]))
docs\overview\python-examples.rst:44:   sq.eval(OpTensorSyncLocal([tensor_out]))
docs\overview\python-examples.rst:69:    seq.eval(kp.OpTensorSyncDevice([tensor_in_a, tensor_in_b, tensor_out]))
docs\overview\python-examples.rst:86:    seq.record(kp.OpTensorSyncLocal([tensor_in_a]))
docs\overview\python-examples.rst:87:    seq.record(kp.OpTensorSyncLocal([tensor_in_b]))
docs\overview\python-examples.rst:88:    seq.record(kp.OpTensorSyncLocal([tensor_out]))
docs\overview\python-examples.rst:114:    mgr.sequence().eval(kp.OpTensorSyncLocal([t1, t3]))
docs\overview\python-examples.rst:125:    sq_sync.record(kp.OpTensorSyncLocal([t1, t3]))
docs\overview\python-examples.rst:211:    sq.sequence().eval(kp.OpTensorSyncDevice(params))
docs\overview\python-examples.rst:216:    sq.record(kp.OpTensorSyncDevice([tensor_w_in, tensor_b_in]))
docs\overview\python-examples.rst:218:    sq.record(kp.OpTensorSyncLocal([tensor_w_out_i, tensor_w_out_j, tensor_b_out, tensor_l_out]))

C++ Reference Documentation:

PS kompute-fork> ls -r docs\overview\reference.rst | select-string OpTensorSync

docs\overview\reference.rst:98:OpTensorSyncLocal
docs\overview\reference.rst:101:The :class:`kp::OpTensorSyncLocal` is a tensor only operation that maps the data from the GPU device memory into the local host vector.
docs\overview\reference.rst:103:.. doxygenclass:: kp::OpTensorSyncLocal
docs\overview\reference.rst:106:OpTensorSyncDevice
docs\overview\reference.rst:109:The :class:`kp::OpTensorSyncDevice` is a tensor only operation that maps the data from the local host vector into the GPU device memory.
docs\overview\reference.rst:111:.. doxygenclass:: kp::OpTensorSyncDevice
docs\overview\reference.rst:119:.. doxygenclass:: kp::OpTensorSyncDevice

Godot code:

PS kompute-fork> ls -r examples\godot* | select-string OpTensorSync

examples\godot_examples\custom_module\kompute_summator\KomputeSummatorNode.cpp:86:        sq->record<kp::OpTensorSyncDevice>({ this->mSecondaryTensor });
examples\godot_examples\custom_module\kompute_summator\KomputeSummatorNode.cpp:92:        sq->record<kp::OpTensorSyncLocal>({ this->mPrimaryTensor });
examples\godot_examples\gdnative_shared\src\KomputeSummator.cpp:83:        this->mSequence->record<kp::OpTensorSyncDevice>(
examples\godot_examples\gdnative_shared\src\KomputeSummator.cpp:92:        this->mSequence->record<kp::OpTensorSyncLocal>(
examples\godot_logistic_regression\custom_module\kompute_model_ml\KomputeModelMLNode.cpp:67:            mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
examples\godot_logistic_regression\custom_module\kompute_model_ml\KomputeModelMLNode.cpp:71:                ->record<kp::OpTensorSyncDevice>({ wIn, bIn })
examples\godot_logistic_regression\custom_module\kompute_model_ml\KomputeModelMLNode.cpp:73:                ->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
examples\godot_logistic_regression\gdnative_shared\src\KomputeModelML.cpp:71:            mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
examples\godot_logistic_regression\gdnative_shared\src\KomputeModelML.cpp:75:                ->record<kp::OpTensorSyncDevice>({ wIn, bIn })
examples\godot_logistic_regression\gdnative_shared\src\KomputeModelML.cpp:77:                ->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });

@axsaucedo
Copy link
Member

Amazing, thank you very much for the contribution, this is indeed well needed - much appreciated.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@axsaucedo axsaucedo merged commit 187b84b into KomputeProject:master Oct 18, 2024
8 checks passed
@KeithJH KeithJH deleted the keithjh/FixExamples branch November 22, 2024 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants