Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possible unsafe use of get_unchecked in spark_compatible_murmur3_hash #633

Closed
andygrove opened this issue Jul 5, 2024 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@andygrove
Copy link
Member

Describe the bug

I ran the miri checker and it warned about a possible unsafe call to get_unchecked that we should investigate.

I will post full details shortly.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@andygrove andygrove added the bug Something isn't working label Jul 5, 2024
@andygrove andygrove self-assigned this Jul 5, 2024
@andygrove andygrove added this to the 0.1.0 milestone Jul 5, 2024
@andygrove
Copy link
Member Author

test execution::datafusion::shuffle_writer::test::test_insert_larger_batch ... error: Undefined Behavior: trying to retag from <789872535> for SharedReadOnly permission at alloc278024494[0xb4b], but that tag does not exist in the borrow stack for this location
--> /home/andy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:140:9
|
140 | &*ptr::slice_from_raw_parts(data, len)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| trying to retag from <789872535> for SharedReadOnly permission at alloc278024494[0xb4b], but that tag does not exist in the borrow stack for this location
| this error occurs as part of retag at alloc278024494[0xb4a..0xb4e]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <789872535> was created by a SharedReadOnly retag at offsets [0xb4a..0xb4b]
--> src/execution/datafusion/spark_hash.rs:93:44
|
93 | std::slice::from_raw_parts(data.get_unchecked(0), len_aligned),
| ^^^^^^^^^^^^^^^^^^^^^
= note: BACKTRACE (of the first span) on thread execution::data:
= note: inside std::slice::from_raw_parts::<'_, u8> at /home/andy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:140:9: 140:47
note: inside execution::datafusion::spark_hash::spark_compatible_murmur3_hash::<&&str>
--> src/execution/datafusion/spark_hash.rs:93:17
|
93 | std::slice::from_raw_parts(data.get_unchecked(0), len_aligned),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside execution::datafusion::spark_hash::create_murmur3_hashes
--> src/execution/datafusion/spark_hash.rs:113:25
|
113 | *hash = $hash_method(&array.value(i), *hash);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
468 | / create_hashes_internal!(
469 | | arrays,
470 | | hashes_buffer,
471 | | spark_compatible_murmur3_hash,
472 | | create_hashes_dictionary
473 | | );
| |_____- in this macro invocation
note: inside closure
--> src/execution/datafusion/shuffle_writer.rs:714:17
|
714 | create_murmur3_hashes(&arrays, hashes_buf)?
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
--> src/execution/datafusion/shuffle_writer.rs:671:44
|
671 | self.partitioning_batch(batch).await?;
| ^^^^^
note: inside closure
--> src/execution/datafusion/shuffle_writer.rs:994:44
|
994 | repartitioner.insert_batch(batch?).await?;
| ^^^^^
= note: inside <{async fn body of execution::datafusion::shuffle_writer::external_shuffle()} as futures::TryFuture>::try_poll at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.30/src/future.rs:82:9: 82:22
= note: inside <futures::future::IntoFuture<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}> as futures::Future>::poll at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/try_future/into_future.rs:34:9: 34:43
= note: inside <futures::future::future::map::Map<futures::future::IntoFuture<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}>, futures_util::fns::MapErrFn<{closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}>> as futures::Future>::poll at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/future/map.rs:55:37: 55:52
= note: inside <futures::future::Map<futures::future::IntoFuture<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}>, futures_util::fns::MapErrFn<{closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}>> as futures::Future>::poll at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/lib.rs:91:13: 91:43
= note: inside <futures::future::MapErr<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}, {closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}> as futures::Future>::poll at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/lib.rs:91:13: 91:43
= note: inside <futures::stream::Once<futures::future::MapErr<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}, {closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}>> as futures::Stream>::poll_next at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/once.rs:46:33: 46:45
= note: inside <futures::stream::Once<futures::future::MapErr<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}, {closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}>> as futures::TryStream>::try_poll_next at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.30/src/stream.rs:196:9: 196:27
= note: inside <futures::stream::TryFlatten<futures::stream::Once<futures::future::MapErr<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}, {closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}>>> as futures::Stream>::poll_next at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/try_stream/try_flatten.rs:66:44: 66:82
= note: inside <datafusion::physical_plan::stream::RecordBatchStreamAdapter<futures::stream::TryFlatten<futures::stream::Once<futures::future::MapErr<{async fn body of execution::datafusion::shuffle_writer::external_shuffle()}, {closure@src/execution/datafusion/shuffle_writer.rs:149:26: 149:29}>>>> as futures::Stream>::poll_next at /home/andy/.cargo/git/checkouts/arrow-datafusion-6b111e5d2e4cbc88/17446b1/datafusion/physical-plan/src/stream.rs:372:9: 372:44
= note: inside <std::pin::Pin<std::boxed::Box<dyn datafusion::execution::RecordBatchStream + std::marker::Send>> as futures::Stream>::poll_next at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.30/src/stream.rs:120:9: 120:46
= note: inside <std::pin::Pin<std::boxed::Box<dyn datafusion::execution::RecordBatchStream + std::marker::Send>> as futures::TryStream>::try_poll_next at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.30/src/stream.rs:196:9: 196:27
= note: inside <futures::stream::TryCollect<std::pin::Pin<std::boxed::Box<dyn datafusion::execution::RecordBatchStream + std::marker::Send>>, std::vec::Vec<arrow_array::RecordBatch>> as futures::Future>::poll at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/try_stream/try_collect.rs:46:26: 46:64
= note: inside closure at /home/andy/.cargo/git/checkouts/arrow-datafusion-6b111e5d2e4cbc88/17446b1/datafusion/physical-plan/src/common.rs:47:36: 47:41
= note: inside closure at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/park.rs:281:63: 281:87
= note: inside tokio::runtime::coop::with_budget::<std::task::Poll<std::result::Result<std::vec::Vec<arrow_array::RecordBatch>, datafusion_common::DataFusionError>>, {closure@tokio::runtime::park::CachedParkThread::block_on<{async fn body of datafusion::physical_plan::common::collect()}>::{closure#0}}> at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/coop.rs:107:5: 107:8
= note: inside tokio::runtime::coop::budget::<std::task::Poll<std::result::Result<std::vec::Vec<arrow_array::RecordBatch>, datafusion_common::DataFusionError>>, {closure@tokio::runtime::park::CachedParkThread::block_on<{async fn body of datafusion::physical_plan::common::collect()}>::{closure#0}}> at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/coop.rs:73:5: 73:38
= note: inside tokio::runtime::park::CachedParkThread::block_on::<{async fn body of datafusion::physical_plan::common::collect()}> at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/park.rs:281:31: 281:88
= note: inside tokio::runtime::context::blocking::BlockingRegionGuard::block_on::<{async fn body of datafusion::physical_plan::common::collect()}> at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/context/blocking.rs:66:9: 66:25
= note: inside closure at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/scheduler/multi_thread/mod.rs:87:13: 87:38
= note: inside tokio::runtime::scheduler::multi_thread::MultiThread::block_on::<{async fn body of datafusion::physical_plan::common::collect()}> at /home/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/scheduler/multi_thread/mod.rs:86:9: 88:11
note: inside execution::datafusion::shuffle_writer::test::test_insert_larger_batch
--> src/execution/datafusion/shuffle_writer.rs:1492:9
|
1492 | rt.block_on(collect(stream)).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
--> src/execution/datafusion/shuffle_writer.rs:1467:34
|
1466 | #[test]
| ------- in this procedural macro expansion
1467 | fn test_insert_larger_batch() {
| ^
= note: this error originates in the macro hash_array which comes from the expansion of the attribute macro test (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with MIRIFLAGS=-Zmiri-backtrace=full for a verbose backtrace

error: aborting due to 1 previous error; 3 warnings emitted

error: test failed, to rerun pass --lib

Caused by:
process didn't exit successfully: /home/andy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner /home/andy/git/apache/datafusion-comet/core/target/miri/x86_64-unknown-linux-gnu/debug/deps/comet-c6f4c2f5607ce9b4 (exit status: 1)
note: test exited abnormally; to see the full output pass --nocapture to the harness.

@andygrove
Copy link
Member Author

Fixed in #636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant