Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get chunk size fix #382

Merged
merged 5 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -579,11 +579,7 @@ fn get_chunk_size(file_size: usize, chunk_index: usize) -> usize {
return 0;
}
if file_size < 3 * MAX_CHUNK_SIZE {
if chunk_index < 2 {
return file_size / 3;
} else {
return file_size - (2 * (file_size / 3));
}
return file_size / 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the aim of the original code block is :
always have chunks[2] to be the one holding extra bytes, when file_size is small AND non-dividable by 3.
say, if file_size is 1024, then chunk[0] & chunk[1] to be 341, chunk[2] to be 342

with the change, all three chunks will then be 341, which will cause a data loss ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a unit test with the above scenario and test it out. It's good to understand the context behind this.

From testing, it definitely wasn't working right and this seemed to fix it. However, it sounds like there could be the edge cases described.

Copy link
Contributor Author

@traktion traktion Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the above and looking at the code again, I think the bug is actually in the seek_info routine. I see there are a bunch of others that use get_chunk_size and they likely need the logic as you describe.

Given seeking is always from the start of the chunk and only the last chunk is oversized, we can use get_chunk_size(file_size, 0) in all cases. EDIT: Actually, that doesn't quite work either (at the upper boundary).

Copy link
Contributor Author

@traktion traktion Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is also an issue with overlapped_chunks due to get_chunk_index, as when you have a large 3rd chunk, it calls get_chunk_size(filesize, 0) (like I thought may work above). However, if you have an oversized 3rd chunk, it returns 0 (instead of 2).

In turn, that means seek_info reverts to start_index == 0, which corrupts the relative_pos value.

I'm adding more tests to check these boundaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @traktion ,

really appreciate your investigations and more tests is always appreciated.

meanwhile, I had a quick scan on self_encryption code, and doess see issues at
https://github.com/maidsafe/self_encryption/blob/master/src/lib.rs#L492
and
https://github.com/maidsafe/self_encryption/blob/master/src/lib.rs#L640

L492 uses remainder when divided by chunk_size of starting chunk, which will be incorrect when:
a, the starting chunk is oversized chunk
b, the position falls into the edge of the oversized chunk

L640 bearing the similar issue that using chunk_size of the first chunk.

I will see if can get some fixes and tests to it on Monday when back.
And will be really appreciated and merge your tests in, if you come up more, or even with a solution to it.

Thank you very much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted my prior change and applied a new fix for L492.

I took a deeper look at L640 and it appears to be ok. While the remainder is incorrect, the final calculation is still valid and returns the correct index. I confirmed this with tests too. The logic is definitely a bit brain twisting though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work !
really appreciate it.

}
let total_chunks = get_num_chunks(file_size);
if chunk_index < total_chunks - 2 {
Expand Down
33 changes: 33 additions & 0 deletions src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,39 @@ fn seek_indices() -> Result<(), Error> {
Ok(())
}

#[test]
fn seek_indices_on_medium_size_file() -> Result<(), Error> {
let file_size = 969_265;
let pos = 0;
let len = 131072;

let info = seek_info(file_size, pos, len);

assert_eq!(0, info.relative_pos);
assert_eq!(0, info.index_range.start);
assert_eq!(0, info.index_range.end);

let info = seek_info(file_size, 131072, len);

assert_eq!(131072, info.relative_pos);
assert_eq!(0, info.index_range.start);
assert_eq!(0, info.index_range.end);

let info = seek_info(file_size, 393216, len);

assert_eq!(70128, info.relative_pos);
assert_eq!(1, info.index_range.start);
assert_eq!(1, info.index_range.end);

let info = seek_info(file_size, 655360, len);

assert_eq!(9184, info.relative_pos);
assert_eq!(2, info.index_range.start);
assert_eq!(2, info.index_range.end);

Ok(())
}

#[test]
fn seek_and_join() -> Result<(), Error> {
for i in 1..15 {
Expand Down
Loading