Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43017: [C++] Make the set of casts and hash kernels involving float16 consistent with other floating types #43018

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Jun 24, 2024

Rationale for this change

To simplify loops and unit tests involving numeric types. Special-casing float16 adds complexity.

What changes are included in this PR?

  • Complete the set of casts involving half-float casts (with the exception of half-float -> decimal128/256 casts)
  • Make half-floats supported in hash kernels (kernels defined in terms of the equality of physical representation of values)

Are these changes tested?

By existing tests that are now extended to run with float16 as input.

@felipecrv felipecrv requested a review from westonpace as a code owner June 24, 2024 03:19
Copy link

⚠️ GitHub issue #43017 has been automatically assigned in GitHub to PR creator.

@felipecrv
Copy link
Contributor Author

@ianmcook @ClifHouck

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Jun 24, 2024
@felipecrv felipecrv requested a review from pitrou June 24, 2024 16:12
@felipecrv
Copy link
Contributor Author

@pitrou

@@ -913,6 +913,16 @@ TEST(RowSegmenter, RowConstantBatch) {
}
}

// XXX: float16 is not part of NumericTypes() yet
auto& AllNumericTypes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this const auto&, to avoid potentially mutating it from a callee?

@@ -959,6 +959,10 @@ KernelType GenerateNumeric(detail::GetTypeId get_id) {
return Generator<Type0, FloatType, Args...>::Exec;
case Type::DOUBLE:
return Generator<Type0, DoubleType, Args...>::Exec;
case Type::HALF_FLOAT:
// NOTE: Type::HALF_FLOAT used to not be part of the list of numeric types,
// so users of this template might start failing to compiler after Arrow 17.x.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are internal helpers, so if Arrow still compiles, I don't think the comment is warranted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. It took me a while to realize these were internal. I will remove the comments.

cpp/src/arrow/compute/kernels/codegen_internal.h Outdated Show resolved Hide resolved
@@ -590,6 +590,13 @@ void AddArraySortingKernels(VectorKernel base, VectorFunction* func) {
base.exec = GenerateNumeric<ExecTemplate, UInt64Type>(*physical_type);
DCHECK_OK(func->AddKernel(base));
}
{
// XXX: float16() is not in NumericTypes() yet
auto physical_type = GetPhysicalType(float16());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the underlying physical type btw? We should be careful that float16 NaNs sort as described as https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions, and we should test for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's float16 itself. I'm changing the code to make that more obvious.

@@ -697,6 +698,12 @@ void AddHashKernels(VectorFunction* func, VectorKernel base, OutputType out_ty)
base.signature = KernelSignature::Make({ty}, out_ty);
DCHECK_OK(func->AddKernel(base));
}
{
// XXX: float16() is not in PrimitiveTypes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bummer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually deal with this somehow. I think we could add a less stable AllPrimitiveTypes() that includes FLOAT16 and eventually mark PrimitiveTypes() as deprecated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 27, 2024
@felipecrv felipecrv requested a review from pitrou June 27, 2024 17:26
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 27, 2024
@pitrou
Copy link
Member

pitrou commented Jun 27, 2024

@felipecrv Do we have tests for sorting float16 values and NaNs already? Otherwise, we should add some.

Also, is only array_sort handled, or does the more general sort also accept float16?

@pitrou
Copy link
Member

pitrou commented Jul 22, 2024

@felipecrv Is this ready for review again?

@felipecrv
Copy link
Contributor Author

@felipecrv Is this ready for review again?

No. Adding these tests you requested creates a cascade of requirements regarding half-float support that I haven't been able to fix even after putting a lot of effort into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants