Skip to content

Commit

Permalink
GH-45126: [C++][Parquet] Fix undefined behavior in the FormatStatValue (
Browse files Browse the repository at this point in the history
#45127)

### Rationale for this change

FormatStatValue function in the parquet/types.cc employs reinterpret_cast to cast bytes to specific data types. It is undefined behavior when the bytes are unaligned.

### What changes are included in this PR?

Use std::memcpy to replace reinterpret_cast.

### Are these changes tested?

Pass CI.

### Are there any user-facing changes?

No.
* GitHub Issue: #45126

Authored-by: Gang Wu <[email protected]>
Signed-off-by: Gang Wu <[email protected]>
  • Loading branch information
wgtmac authored Dec 31, 2024
1 parent fd1bf8e commit cc56f12
Showing 1 changed file with 32 additions and 16 deletions.
48 changes: 32 additions & 16 deletions cpp/src/parquet/types.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.

#include <array>
#include <cmath>
#include <cstdint>
#include <memory>
Expand Down Expand Up @@ -95,31 +96,46 @@ std::string FormatStatValue(Type::type parquet_type, ::std::string_view val) {

const char* bytes = val.data();
switch (parquet_type) {
case Type::BOOLEAN:
result << reinterpret_cast<const bool*>(bytes)[0];
case Type::BOOLEAN: {
bool value{};
std::memcpy(&value, bytes, sizeof(bool));
result << value;
break;
case Type::INT32:
result << reinterpret_cast<const int32_t*>(bytes)[0];
}
case Type::INT32: {
int32_t value{};
std::memcpy(&value, bytes, sizeof(int32_t));
result << value;
break;
case Type::INT64:
result << reinterpret_cast<const int64_t*>(bytes)[0];
}
case Type::INT64: {
int64_t value{};
std::memcpy(&value, bytes, sizeof(int64_t));
result << value;
break;
case Type::DOUBLE:
result << reinterpret_cast<const double*>(bytes)[0];
}
case Type::DOUBLE: {
double value{};
std::memcpy(&value, bytes, sizeof(double));
result << value;
break;
case Type::FLOAT:
result << reinterpret_cast<const float*>(bytes)[0];
}
case Type::FLOAT: {
float value{};
std::memcpy(&value, bytes, sizeof(float));
result << value;
break;
}
case Type::INT96: {
auto const i32_val = reinterpret_cast<const int32_t*>(bytes);
result << i32_val[0] << " " << i32_val[1] << " " << i32_val[2];
std::array<int32_t, 3> values{};
std::memcpy(values.data(), bytes, 3 * sizeof(int32_t));
result << values[0] << " " << values[1] << " " << values[2];
break;
}
case Type::BYTE_ARRAY: {
return std::string(val);
}
case Type::BYTE_ARRAY:
case Type::FIXED_LEN_BYTE_ARRAY: {
return std::string(val);
result << val;
break;
}
case Type::UNDEFINED:
default:
Expand Down

0 comments on commit cc56f12

Please sign in to comment.