Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get chunk size fix #382

Merged
merged 5 commits into from
Apr 22, 2024
Merged

Get chunk size fix #382

merged 5 commits into from
Apr 22, 2024

Conversation

traktion
Copy link
Contributor

@traktion traktion commented Apr 17, 2024

  • Updated the logic for calculating the chunk size when the chunk index is > 1 and the file size is less than 3 * MAX_CHUNK_SIZE (1.5 MB currently).
  • The prior logic was incorrectly rounding, causing the size to be out by a small margin. This caused seek_info to return an incorrect relative_pos.
  • Tested with 1 KB, 8 KB, 256 KB, 123456 bytes, 512 KB and 2048 KB over sn_httpd and confirmed all payloads were correct with Wireshark.

Link to issue:
#381

Link to issue:
maidsafe/autonomi#1621

…ex is > 1 and the file size is less than 3 * MAX_CHUNK_SIZE (1.5 MB currently).

- The prior logic was incorrectly rounding, causing the size to be out by a small margin. This caused seek_info to return an incorrect relative_pos.
- Tested with 1 KB, 8 KB, 256 KB, 123456 bytes, 512 KB and 2048 KB over sn_httpd and confirmed all payloads were correct with Wireshark.
@joshuef
Copy link
Contributor

joshuef commented Apr 18, 2024

nice one @traktion thank you!

can you update for the clippy errors in ci there that appears to b tripping up the other CI runs, I think 🙇

src/lib.rs Outdated
} else {
return file_size - (2 * (file_size / 3));
}
return file_size / 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the aim of the original code block is :
always have chunks[2] to be the one holding extra bytes, when file_size is small AND non-dividable by 3.
say, if file_size is 1024, then chunk[0] & chunk[1] to be 341, chunk[2] to be 342

with the change, all three chunks will then be 341, which will cause a data loss ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a unit test with the above scenario and test it out. It's good to understand the context behind this.

From testing, it definitely wasn't working right and this seemed to fix it. However, it sounds like there could be the edge cases described.

Copy link
Contributor Author

@traktion traktion Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the above and looking at the code again, I think the bug is actually in the seek_info routine. I see there are a bunch of others that use get_chunk_size and they likely need the logic as you describe.

Given seeking is always from the start of the chunk and only the last chunk is oversized, we can use get_chunk_size(file_size, 0) in all cases. EDIT: Actually, that doesn't quite work either (at the upper boundary).

Copy link
Contributor Author

@traktion traktion Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is also an issue with overlapped_chunks due to get_chunk_index, as when you have a large 3rd chunk, it calls get_chunk_size(filesize, 0) (like I thought may work above). However, if you have an oversized 3rd chunk, it returns 0 (instead of 2).

In turn, that means seek_info reverts to start_index == 0, which corrupts the relative_pos value.

I'm adding more tests to check these boundaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @traktion ,

really appreciate your investigations and more tests is always appreciated.

meanwhile, I had a quick scan on self_encryption code, and doess see issues at
https://github.com/maidsafe/self_encryption/blob/master/src/lib.rs#L492
and
https://github.com/maidsafe/self_encryption/blob/master/src/lib.rs#L640

L492 uses remainder when divided by chunk_size of starting chunk, which will be incorrect when:
a, the starting chunk is oversized chunk
b, the position falls into the edge of the oversized chunk

L640 bearing the similar issue that using chunk_size of the first chunk.

I will see if can get some fixes and tests to it on Monday when back.
And will be really appreciated and merge your tests in, if you come up more, or even with a solution to it.

Thank you very much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted my prior change and applied a new fix for L492.

I took a deeper look at L640 and it appears to be ok. While the remainder is incorrect, the final calculation is still valid and returns the correct index. I confirmed this with tests too. The logic is definitely a bit brain twisting though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work !
really appreciate it.

…dex is > 1 and the file size is less than 3 * MAX_CHUNK_SIZE (1.5 MB currently).

- The prior logic was correctly rounding, with the last chunk intentionally larger than the others (in 3 chunk file scenario).
- Updated the logic in seek_info() to be aware of oversized last chunks (in 3 chunk file scenario).
- Improved test coverage to validate these boundaries conditions are behaving correctly.
maqi
maqi previously approved these changes Apr 19, 2024
@traktion
Copy link
Contributor Author

I'm not sure why the build is snagging on the above. It doesn't look like there should be any merge conflicts (no changes in months) and I've only added code coverage. Please lmk if there is something I need to fix! Thanks!

@joshuef joshuef merged commit 19867d6 into maidsafe:master Apr 22, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants