-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide try-with-resources-like function to SHA256 #831
base: next
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,7 @@ LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING | |
import java.security.Provider; | ||
import java.util.Queue; | ||
import java.util.concurrent.ConcurrentLinkedQueue; | ||
import java.util.function.Consumer; | ||
|
||
import org.tanukisoftware.wrapper.WrapperManager; | ||
|
||
|
@@ -119,10 +120,26 @@ public static void returnMessageDigest(MessageDigest md256) { | |
} | ||
|
||
public static byte[] digest(byte[] data) { | ||
return digest(md -> md.update(data)); | ||
} | ||
|
||
public static byte[] digest(Consumer<MessageDigest> updater) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even though this is definitely an improvement over the existing code, none of the places where this method is used actually seem to be interested in "consuming a MessageDigest". Instead, they just want to digest multiple sources, together. Can we just go for the byte[] outerKey = SHA256.digest(md -> {
md.update(pwd);
md.update(salt);
}); could simply become: byte[] outerKey = SHA256.digest(pwd, salt); (and yes, there are a few places that would like to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, the But yeah, by adding the vararg |
||
MessageDigest md = null; | ||
try { | ||
md = getMessageDigest(); | ||
updater.accept(md); | ||
return md.digest(); | ||
} finally { | ||
returnMessageDigest(md); | ||
} | ||
} | ||
|
||
public static byte[] digest(InputStream is) throws IOException { | ||
MessageDigest md = null; | ||
try { | ||
md = getMessageDigest(); | ||
return md.digest(data); | ||
hash(is, md); | ||
return md.digest(); | ||
} finally { | ||
returnMessageDigest(md); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we’re doing this (and we should!) I would also love to deprecate the
getMessageDigest()
andreturnMessageDigest()
methods (hash()
, too, probably), with a note on how to digest stuff in 2024! 😁There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny enough, on modern Java this get/return dance is actually slower* compared to just creating a fresh SHA256 digest & using it just once 🙃
Maybe we should get rid of getMessageDigest/returnMessageDigest altogether (or make them return a fresh instance / no-op respectively), but that's probably for later.
* I measured 30% performance decrease for get/return over fresh digests when used by 4 threads simultaneously in a microbenchmark, when digesting a moderate 272 bytes. Most of the time is spent in updating the queue & resetting the digest rather than doing the actual computations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it’s quite possible that this, too, is a remnant of times where certain things (like
MessageDigest
creation) were incredibly expensive, in which case we should get rid of them.Can you set up a small benchmark (maybe using JMH or something similar) so we can run a couple of comparisons over various platforms to see if we can get rid of this? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already had this set up in JMH so sharing it was just a cleanup away. Here you go!
https://gist.github.com/bertm/94f3c97fd138fe075a1f54076c6bd3cf
My output (on a laptop with 4c/8t Intel i7-10510U and OpenJDK 21):
Same laptop with OpenJDK 8 is substantially slower, but still using fresh digests performs faster than pooling them:
(Of course for large payloads the difference is negligible, as can be seen for the 2048 byte payload testcase.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My results from a 2020 MacBook Pro with Intel i7, running macOS 11.7.9.
Java 8.0.402-tem:
Java 17.0.8-tem:
Java 21.0.1-tem:
I have to admit that I am slightly confused by these results. When using larger payloads the pooled digests are faster, indicating that creating a small number of digests is somehow slower than creating a large number of digests? I have the feeling we’re missing something here…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your measurement error is sky high (for 16 bytes approx. 50% of your measurement). The confidence intervals of your pooled and fresh measurements overlap substantially for 2048 bytes, rendering those results inconclusive. Have you tried your MacBook in performance mode? If that does not help to reduce the jitter, try increasing the number of iterations or the duration of the iterations.
Here's a re-run on OpenJDK 21 with 20 seconds per iteration on my machine, note the errors are now in the realm of 1-2% of the score where they were previously >10%:
In any case, the 16 byte measurement is the most relevant: it highlights the overhead of borrow/return versus object allocation best. It would be reasonable to assume that once obtained, both digests perform equally well. At that point, both variants are are just instances of the exact same class with the exact same state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll probably repeat some of the measurements tomorrow; that’s my work machine and sometimes I actually do have to work, even while benchmarking. 😀
Anyway, most relevant would be the most common size we’re hashing, and I guess that would be the size of our routing keys. Are those 16 bytes? (I have the feeling that I really should know these things… 😁) Either way, most common size will probably indeed be a lot closer to 16 than to 2048 so getting rid of the pool will probably get a 👍 from me.