-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize checksum calculation on verify #861
Conversation
@aryanA101a Thank you for your promising PR. I really appreciate this PR if confirmed to be done the right way, but probably too short to close the ticket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
This is a tricky part here.
I wonder if we don't have to change the logic by having only one loop, reading as long as we have data to read (instead of a while loop inside a for loop). Something like:
auto remaining = checkSumPos;
auto part_iter = zimFile->begin();
std::ifstream current_stream(part->second->filename(), ...);
while (remaining) {
stream.read(reinterpret_cast<char*>(ch), min(piece_size, remaining));
zim_MD5Update(&md5ctx, ch, stream.gcount());
remaining -= stream.gcount();
if (remaining == 0) {
// we have read everything
break;
}
if (!stream.good()) {
part_iter++;
current_stream = std::ifstream(part->second->filename(), ...);
}
}
(This is the global idea, error checking and stuff have to be added)
Have you tried with a chunk bigger than 1024 ? How it behaves ?
From the beginning, there were two loops(nested for loops).
Yes, I have tried more chunk sizes.
I think 1024 is the sweet spot here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have few small changes but we are mostly good.
Please squash your commits before the next review round.
Yes, but as we refactor, we may remove the nested loops. My solution also loop on the parts, it "simply" not using a while or for. Anyway, you last proposition is mostly good, so let's continue with it. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #861 +/- ##
==========================================
- Coverage 58.04% 58.00% -0.05%
==========================================
Files 101 101
Lines 4617 4622 +5
Branches 1921 1925 +4
==========================================
+ Hits 2680 2681 +1
Misses 667 667
- Partials 1270 1274 +4 ☔ View full report in Codecov by Sentry. |
ea2cdbe
to
02f7c68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two last tinny changes before merging:
- Wrong indentation
- Previous comment about
CHUNCK_SIZE
vsstream.gcount()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the PR !
Related to #614