Skip to content

Commit

Permalink
Fix local work size for conv kernel yxfb_yxio_b16 with fp16 (openvino…
Browse files Browse the repository at this point in the history
…toolkit#11679)

convolution_gpu_yxfb_yxio_b16 for fp16 has hardcoded reqd_work_group_size
to (16, 1, 1). On devices where CL_DEVICE_MAX_WORK_GROUP_SIZE is 512
GetOptimalLocalWorkGroupSizes picks (16, 2, 1) for LWS.
That causes issues during clEnqueueNDRangeKernel since LWS doesn't match
with reqd_work_group_size in the kernel.
  • Loading branch information
mateusztabaka authored May 23, 2022
1 parent fbc99ef commit ff6ea62
Showing 1 changed file with 5 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,11 @@ ConvolutionKernelBase::DispatchData ConvolutionKernel_yxfb_yxio_b16::SetDefault(
dispatchData.lws[0] = min_lws;
dispatchData.gws[0] = filter_ofm_num * batch_size / (ofmPerWorkItem * batchesPerWorkItem);

if (arg.inputs[0].GetDType() == Datatype::F16) {
dispatchData.lws[1] = 1;
dispatchData.lws[2] = 1;
}

return dispatchData;
}

Expand Down

0 comments on commit ff6ea62

Please sign in to comment.