Fix local work size for conv kernel yxfb_yxio_b16 with fp16 (openvino…

…toolkit#11679) convolution_gpu_yxfb_yxio_b16 for fp16 has hardcoded reqd_work_group_size to (16, 1, 1). On devices where CL_DEVICE_MAX_WORK_GROUP_SIZE is 512 GetOptimalLocalWorkGroupSizes picks (16, 2, 1) for LWS. That causes issues during clEnqueueNDRangeKernel since LWS doesn't match with reqd_work_group_size in the kernel.
yli147 · May 23, 2022 · ff6ea62 · ff6ea62
1 parent fbc99ef
commit ff6ea62
Showing 1 changed file with 5 additions and 0 deletions.
diff --git a/.../src/kernel_selector/core/actual_kernels/convolution/convolution_kernel_yxfb_yxio_b16.cpp b/.../src/kernel_selector/core/actual_kernels/convolution/convolution_kernel_yxfb_yxio_b16.cpp
@@ -78,6 +78,11 @@ ConvolutionKernelBase::DispatchData ConvolutionKernel_yxfb_yxio_b16::SetDefault(
     dispatchData.lws[0] = min_lws;
     dispatchData.gws[0] = filter_ofm_num * batch_size / (ofmPerWorkItem * batchesPerWorkItem);
 
+    if (arg.inputs[0].GetDType() == Datatype::F16) {
+        dispatchData.lws[1] = 1;
+        dispatchData.lws[2] = 1;
+    }
+
     return dispatchData;
 }