Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: allow external_access_plan run on windows #13531

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zhuliquan
Copy link
Contributor

Which issue does this PR close?

Closes #13530.

Rationale for this change

I always receive this warning when running cargo clippy, and I would like to get rid of it.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 22, 2024
alamb
alamb previously approved these changes Nov 22, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it is not used when not all features are activated 🤔

@alamb
Copy link
Contributor

alamb commented Nov 22, 2024

Thanks @zhuliquan - it would also be really nice to figure out how you are running the code/tests to see this error. Maybe there is some better way we can come up with to fix it

@findepi
Copy link
Member

findepi commented Nov 22, 2024

we should really avoid #[allow(dead_code)] for stuff used only sometimes, because we won't know when this can be removed/

This is used in external_access_plan which is disabled on windows.
Maybe we should try enabling it?

@zhuliquan zhuliquan changed the title minor: allow dead code for Scenario::UTF8 test: allow external_access_plan run on windows Nov 23, 2024
@zhuliquan
Copy link
Contributor Author

zhuliquan commented Nov 23, 2024

we should really avoid #[allow(dead_code)] for stuff used only sometimes, because we won't know when this can be removed/

This is used in external_access_plan which is disabled on windows. Maybe we should try enabling it?

Yeah, I fix code and enable it

let new_file_name = if cfg!(target_os = "windows") {
    // Windows path separator is different from Unix
    file_name.replace("\\", "/")
} else {
    file_name.clone()
};

@zhuliquan zhuliquan closed this Nov 23, 2024
@alamb alamb reopened this Nov 23, 2024
@alamb
Copy link
Contributor

alamb commented Nov 23, 2024

🤔 the test now seems to be failing

I wonder if it has to do with how the temp files are created 🤔

@alamb alamb dismissed their stale review November 23, 2024 12:07

Test has been changed

@zhuliquan
Copy link
Contributor Author

🤔 the test now seems to be failing

I wonder if it has to do with how the temp files are created 🤔

I noticed that the char '%' appears in the directory, and the Windows system seems to be unable to find the file.

thread 'parquet::external_access_plan::two_selections' panicked at datafusion\core\tests\parquet\external_access_plan.rs:289:10:
called `Result::unwrap()` on an `Err` value: ParquetError(External(NotFound { path: "C:\\Users\\RUNNER%7E1\\AppData\\Local\\Temp\\user_access_planObCLIR.parquet", source: Os { code: 3, kind: NotFound, message: "The system cannot find the path specified." } }))

So I fix below code to avoid char '%'.

  1. replacing '\' to '/' when target_os is windows.

    let new_file_name = if cfg!(target_os = "windows") {
    // Windows path separator is different from Unix
    file_name.replace("\\", "/")
    } else {
    file_name.clone()
    };

    Because, Path::new will replace '\' to '\%5C' in PartitionedFile::new.
    let mut partitioned_file = PartitionedFile::new(new_file_name, *file_size);

    and
    pub fn new(path: impl Into<String>, size: u64) -> Self {
    Self {
    object_meta: ObjectMeta {
    location: Path::from(path.into()),

  2. using current dir as temp file dir (Because temp dir in CI is located at C:\\Users\\RUNNER%7E1 and other loading parquet cases which are run ok on windows also use data in ../../parquet-testing/data), and remove temp file at the end of the case run.


    and
    std::fs::remove_file(file_name).unwrap();

  3. For sake of removing temp file at the end of each case run, I remove singleton mode of TestData in get_test_data

    static _TEST_DATA: OnceLock<TestData> = OnceLock::new();
    /// Return a parquet file with 2 row groups each with 5 rows
    fn get_test_data() -> TestData {
    let scenario = Scenario::UTF8;
    let row_per_group = 5;
    let mut temp_file = tempfile::Builder::new()
    .prefix("user_access_plan")
    .suffix(".parquet")
    .tempfile_in(Path::new(""))
    .expect("tempfile creation");
    let props = WriterProperties::builder()
    .set_max_row_group_size(row_per_group)
    .build();
    let batches = create_data_batch(scenario);
    let schema = batches[0].schema();
    let mut writer =
    ArrowWriter::try_new(&mut temp_file, schema.clone(), Some(props)).unwrap();
    for batch in batches {
    writer.write(&batch).expect("writing batch");
    }
    writer.close().unwrap();
    let file_name = temp_file.path().to_string_lossy().to_string();
    let file_size = temp_file.path().metadata().unwrap().len();
    TestData {
    _temp_file: temp_file,
    schema,
    file_name,
    file_size,
    }
    }

@zhuliquan zhuliquan requested a review from alamb November 23, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warning: variant UTF8 is never constructed
3 participants