-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get chunk size fix #382
Get chunk size fix #382
Conversation
…ex is > 1 and the file size is less than 3 * MAX_CHUNK_SIZE (1.5 MB currently). - The prior logic was incorrectly rounding, causing the size to be out by a small margin. This caused seek_info to return an incorrect relative_pos. - Tested with 1 KB, 8 KB, 256 KB, 123456 bytes, 512 KB and 2048 KB over sn_httpd and confirmed all payloads were correct with Wireshark.
src/lib.rs
Outdated
} else { | ||
return file_size - (2 * (file_size / 3)); | ||
} | ||
return file_size / 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the aim of the original code block is :
always have chunks[2] to be the one holding extra bytes, when file_size is small AND non-dividable by 3.
say, if file_size is 1024, then chunk[0] & chunk[1] to be 341, chunk[2] to be 342
with the change, all three chunks will then be 341, which will cause a data loss ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a unit test with the above scenario and test it out. It's good to understand the context behind this.
From testing, it definitely wasn't working right and this seemed to fix it. However, it sounds like there could be the edge cases described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the above and looking at the code again, I think the bug is actually in the seek_info
routine. I see there are a bunch of others that use get_chunk_size
and they likely need the logic as you describe.
Given seeking is always from the start of the chunk and only the last chunk is oversized, we can use get_chunk_size(file_size, 0)
in all cases. EDIT: Actually, that doesn't quite work either (at the upper boundary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is also an issue with overlapped_chunks
due to get_chunk_index
, as when you have a large 3rd chunk, it calls get_chunk_size(filesize, 0)
(like I thought may work above). However, if you have an oversized 3rd chunk, it returns 0 (instead of 2).
In turn, that means seek_info
reverts to start_index == 0, which corrupts the relative_pos value.
I'm adding more tests to check these boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @traktion ,
really appreciate your investigations and more tests is always appreciated.
meanwhile, I had a quick scan on self_encryption code, and doess see issues at
https://github.com/maidsafe/self_encryption/blob/master/src/lib.rs#L492
and
https://github.com/maidsafe/self_encryption/blob/master/src/lib.rs#L640
L492 uses remainder when divided by chunk_size of starting chunk, which will be incorrect when:
a, the starting chunk is oversized chunk
b, the position falls into the edge of the oversized chunk
L640 bearing the similar issue that using chunk_size of the first chunk.
I will see if can get some fixes and tests to it on Monday when back.
And will be really appreciated and merge your tests in, if you come up more, or even with a solution to it.
Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted my prior change and applied a new fix for L492.
I took a deeper look at L640 and it appears to be ok. While the remainder is incorrect, the final calculation is still valid and returns the correct index. I confirmed this with tests too. The logic is definitely a bit brain twisting though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work !
really appreciate it.
…dex is > 1 and the file size is less than 3 * MAX_CHUNK_SIZE (1.5 MB currently). - The prior logic was correctly rounding, with the last chunk intentionally larger than the others (in 3 chunk file scenario). - Updated the logic in seek_info() to be aware of oversized last chunks (in 3 chunk file scenario). - Improved test coverage to validate these boundaries conditions are behaving correctly.
I'm not sure why the build is snagging on the above. It doesn't look like there should be any merge conflicts (no changes in months) and I've only added code coverage. Please lmk if there is something I need to fix! Thanks! |
Link to issue:
#381
Link to issue:
maidsafe/autonomi#1621