GH-43017: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43018

felipecrv · 2024-06-24T03:19:15Z

Rationale for this change

To simplify loops and unit tests involving numeric types. Special-casing float16 adds complexity.

What changes are included in this PR?

Complete the set of casts involving half-float casts (with the exception of half-float -> decimal128/256 casts)
Make half-floats supported in hash kernels (kernels defined in terms of the equality of physical representation of values)

Are these changes tested?

By existing tests that are now extended to run with float16 as input.

GitHub Issue: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43017

github-actions · 2024-06-24T03:19:40Z

⚠️ GitHub issue #43017 has been automatically assigned in GitHub to PR creator.

felipecrv · 2024-06-24T03:19:47Z

@ianmcook @ClifHouck

felipecrv · 2024-06-26T20:38:22Z

@pitrou

…at64 casts

…pe to GenerateNumeric<>

…fFloatType to GenerateNumeric<>" This reverts commit 436071f.

pitrou · 2024-06-27T14:42:13Z

cpp/src/arrow/acero/hash_aggregate_test.cc

@@ -913,6 +913,16 @@ TEST(RowSegmenter, RowConstantBatch) {
  }
 }

+// XXX: float16 is not part of NumericTypes() yet
+auto& AllNumericTypes() {


Can we make this const auto&, to avoid potentially mutating it from a callee?

pitrou · 2024-06-27T14:43:02Z

cpp/src/arrow/compute/kernels/codegen_internal.h

@@ -959,6 +959,10 @@ KernelType GenerateNumeric(detail::GetTypeId get_id) {
      return Generator<Type0, FloatType, Args...>::Exec;
    case Type::DOUBLE:
      return Generator<Type0, DoubleType, Args...>::Exec;
+    case Type::HALF_FLOAT:
+      // NOTE: Type::HALF_FLOAT used to not be part of the list of numeric types,
+      // so users of this template might start failing to compiler after Arrow 17.x.


These are internal helpers, so if Arrow still compiles, I don't think the comment is warranted.

Cool. It took me a while to realize these were internal. I will remove the comments.

cpp/src/arrow/compute/kernels/codegen_internal.h

pitrou · 2024-06-27T14:48:52Z

cpp/src/arrow/compute/kernels/vector_array_sort.cc

@@ -590,6 +590,13 @@ void AddArraySortingKernels(VectorKernel base, VectorFunction* func) {
    base.exec = GenerateNumeric<ExecTemplate, UInt64Type>(*physical_type);
    DCHECK_OK(func->AddKernel(base));
  }
+  {
+    // XXX: float16() is not in NumericTypes() yet
+    auto physical_type = GetPhysicalType(float16());


What is the underlying physical type btw? We should be careful that float16 NaNs sort as described as https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions, and we should test for that.

It's float16 itself. I'm changing the code to make that more obvious.

pitrou · 2024-06-27T14:49:25Z

cpp/src/arrow/compute/kernels/vector_hash.cc

@@ -697,6 +698,12 @@ void AddHashKernels(VectorFunction* func, VectorKernel base, OutputType out_ty)
    base.signature = KernelSignature::Make({ty}, out_ty);
    DCHECK_OK(func->AddKernel(base));
  }
+  {
+    // XXX: float16() is not in PrimitiveTypes()


That's a bummer.

We should eventually deal with this somehow. I think we could add a less stable AllPrimitiveTypes() that includes FLOAT16 and eventually mark PrimitiveTypes() as deprecated.

…plate instantiation

pitrou · 2024-06-27T17:29:49Z

@felipecrv Do we have tests for sorting float16 values and NaNs already? Otherwise, we should add some.

Also, is only array_sort handled, or does the more general sort also accept float16?

pitrou · 2024-07-22T11:28:05Z

@felipecrv Is this ready for review again?

felipecrv · 2024-07-23T02:40:44Z

@felipecrv Is this ready for review again?

No. Adding these tests you requested creates a cascade of requirements regarding half-float support that I haven't been able to fix even after putting a lot of effort into it.

felipecrv requested a review from westonpace as a code owner June 24, 2024 03:19

github-actions bot added the Component: C++ label Jun 24, 2024

github-actions bot added the awaiting committer review Awaiting committer review label Jun 24, 2024

felipecrv requested a review from pitrou June 24, 2024 16:12

felipecrv added 8 commits June 26, 2024 17:38

scalar_cast_numeric.cc: Remove redundant template specialization

9fc1d68

scalar_cast_numeric.cc: Unify the handling of float16 and float32/flo…

fc372ed

…at64 casts

scalar_cast_boolean.cc: Cast float16 to boolean

fbb466d

Use SFINAE to avoid compilation errors due to addition of HalfFloatTy…

5d81b31

…pe to GenerateNumeric<>

Revert "Use SFINAE to avoid compilation errors due to addition of Hal…

7b7cae1

…fFloatType to GenerateNumeric<>" This reverts commit 436071f.

scalar_cast_numeric.cc: Support cast from boolean to float16

0b545df

vector_hash.cc: Support float16 in hash kernels

52d2b23

hash_aggregate.cc: Support HalfFloatType in the kernel factories

7fe3942

felipecrv force-pushed the half_float_casts branch from 19c70b1 to 7fe3942 Compare June 26, 2024 20:38

This was referenced Jun 27, 2024

[C++] Cast to/from halffloat not implemented #20213

Closed

[C++] unsupported cast from halffloat to utf8 #32802

Open

pitrou reviewed Jun 27, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 27, 2024

felipecrv added 3 commits June 27, 2024 14:15

fixup! hash_aggregate.cc: Support HalfFloatType in the kernel factories

9f38734

codegen_internal.h: Remove notes about the addition of HALF_FLOAT tem…

6c3fda7

…plate instantiation

fixup! hash_aggregate.cc: Support HalfFloatType in the kernel factories

0054d80

felipecrv requested a review from pitrou June 27, 2024 17:26

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 27, 2024

felipecrv added 2 commits June 27, 2024 21:29

json_simple.cpp: Use & returned by emplace_back()

a10873b

json_simple.cc: Parse float16 from JSON

ddd841b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-43017: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43018

GH-43017: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43018

felipecrv commented Jun 24, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Jun 24, 2024

felipecrv commented Jun 24, 2024

felipecrv commented Jun 26, 2024

pitrou Jun 27, 2024

pitrou Jun 27, 2024

felipecrv Jun 27, 2024

pitrou Jun 27, 2024

felipecrv Jun 27, 2024

pitrou Jun 27, 2024

felipecrv Jun 27, 2024

pitrou commented Jun 27, 2024

pitrou commented Jul 22, 2024

felipecrv commented Jul 23, 2024

GH-43017: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43018

Are you sure you want to change the base?

GH-43017: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43018

Conversation

felipecrv commented Jun 24, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

github-actions bot commented Jun 24, 2024

felipecrv commented Jun 24, 2024

felipecrv commented Jun 26, 2024

pitrou Jun 27, 2024

Choose a reason for hiding this comment

pitrou Jun 27, 2024

Choose a reason for hiding this comment

felipecrv Jun 27, 2024

Choose a reason for hiding this comment

pitrou Jun 27, 2024

Choose a reason for hiding this comment

felipecrv Jun 27, 2024

Choose a reason for hiding this comment

pitrou Jun 27, 2024

Choose a reason for hiding this comment

felipecrv Jun 27, 2024

Choose a reason for hiding this comment

pitrou commented Jun 27, 2024

pitrou commented Jul 22, 2024

felipecrv commented Jul 23, 2024

felipecrv commented Jun 24, 2024 •

edited by github-actions bot

Loading