You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New API Proposal: Get the hex string of the Insecure.MD5.hash digest without impacting the performance
Motivation:
I'm wondering how to get the hex string of the Insecure.MD5.hash digest while trying to compute the md5 of files. However, I was only able to get the "description" of the md5 data as below:
However, after adopting this approach, the performance of our project was highly impacted compared with the old "CryptoSwift" lib (cuz md5 is heavily involved in our workflow). After some investigation, I was doubting that the above compactMap may actually be the root cause. Then I did a performance test as below:
lettestFilePath=Path("/Users/bytedance/testFile")letdata=try testFilePath.read()lettimes=1000000print(data.md5().toHexString())print(Insecure.MD5.hash(data: data).description)
// Test CryptoSwift md5
letcryptoSwiftStartTime=Date.now
for _ in 0..<times {let _ = data.md5().toHexString()}letcryptoSwiftEndTime=Date.now
letcryptoSwiftInterval= cryptoSwiftEndTime.timeIntervalSince(cryptoSwiftStartTime)print("CryptoSwift: \(cryptoSwiftInterval)")
// Test swift-crypto md5
letswiftCryptoStartTime=Date.now
for _ in 0..<times {letdigest=Insecure.MD5.hash(data: data)let _ = digest.compactMap{String(format:"%02x", $0)}.joined()}letswiftCryptoEndTime=Date.now
letswiftCryptoInterval= swiftCryptoEndTime.timeIntervalSince(swiftCryptoStartTime)print("swift-crypto: \(swiftCryptoInterval)")
It turned out that:
For small files (<1KB), swift-crypto is more than 3x slower than CryptoSwift.
For large files (> 1MB), swift-crypto is more than 3x faster than CryptoSwift.
However, our workflow mainly involves thousands of small files which causes the performance issue. To further make sure that the "map" operation makes it slower, I removed the "map" and did anther test as below:
lettestFilePath=Path("/Users/bytedance/testFile")letdata=try testFilePath.read()lettimes=1000000print(data.md5().toHexString())print(Insecure.MD5.hash(data: data).description)
// Test CryptoSwift md5
letcryptoSwiftStartTime=Date.now
for _ in 0..<times {let _ = data.md5()}letcryptoSwiftEndTime=Date.now
letcryptoSwiftInterval= cryptoSwiftEndTime.timeIntervalSince(cryptoSwiftStartTime)print("CryptoSwift: \(cryptoSwiftInterval)")
// Test swift-crypto md5
letswiftCryptoStartTime=Date.now
for _ in 0..<times {letdigest=Insecure.MD5.hash(data: data)}letswiftCryptoEndTime=Date.now
letswiftCryptoInterval= swiftCryptoEndTime.timeIntervalSince(swiftCryptoStartTime)print("swift-crypto: \(swiftCryptoInterval)")
Now, swift-crypto is faster than CryptoSwift in both cases.
It can be concluded that the transformation from digest to hex string actually has non-trivial impact on the performance.
Then I did a global search within swift-crypto to figure out how it handles hex strings. I found that there's an internal computed property hexString extended to DataProtocol which is used to generate hex strings. Since it's internal, I have to duplicate the PrettyBytes.swift file in out project. Another test shows that the hexString extension has no impact on the performance.
Therefore, the final solution is to duplicate the PrettyBytes.swift file in our project to get access to the hexString extension to solve the performance issue.
All in all, my questions are:
Am I missing some existing APIs which transform the digest data into hex string without impacting the performance?
If not, is it possible to make the hexString extension be a public API to support such use case?
Is there any other recommended way to solve the above issue?
Thanks so much!
Importance:
This may not be a serious problem in small projects. However, for huge projects like what I'm currently working on, we have really strict performance requirements. The performance impact is unacceptable after migrating from "CryptoSwift" to "swift-crypto". Hopefully, this issue can be solved soon :)
The text was updated successfully, but these errors were encountered:
In Crypto itself, we're not likely to offer the hex string extension itself as it's not part of the CryptoKit API surface. However, if you're willing to use _CryptoExtras, we could add the API as an extension in that module.
If that's not acceptable then I would recommend copying out the PrettyBytes.swift file as you have, or just the relevant pieces. In fact, even that is a diagnostic tool without any performance work done on it, so you can further speed it up:
let charA = UInt8(ascii: "a")
let char0 = UInt8(ascii: "0")
private func itoh(_ value: UInt8) -> UInt8 {
assert(value < 16)
return (value > 9) ? (charA &+ value &- 10) : (char0 &+ value)
}
extension DataProtocol {
var hexString: String {
let hexLen = self.count * 2
return String(unsafeUninitializedCapacity: hexLen) { hexChars in
var offset = 0
self.regions.forEach { (_) in
for i in self {
hexChars[Int(offset &* 2)] = itoh((i >> 4) & 0xF)
hexChars[Int(offset &* 2 &+ 1)] = itoh(i & 0xF)
offset &+= 1
}
}
return offset &* 2
}
}
}
New API Proposal: Get the hex string of the Insecure.MD5.hash digest without impacting the performance
Motivation:
I'm wondering how to get the hex string of the Insecure.MD5.hash digest while trying to compute the md5 of files. However, I was only able to get the "description" of the md5 data as below:
What I really needed is the "dbe2a50e3babf7c9ac1c1ac25551f5a2" part instead of with "MD5 digest: " included.
After some googling, one way to do that is to manually use map to construct the hex string as below:
However, after adopting this approach, the performance of our project was highly impacted compared with the old "CryptoSwift" lib (cuz md5 is heavily involved in our workflow). After some investigation, I was doubting that the above compactMap may actually be the root cause. Then I did a performance test as below:
It turned out that:
However, our workflow mainly involves thousands of small files which causes the performance issue. To further make sure that the "map" operation makes it slower, I removed the "map" and did anther test as below:
Now, swift-crypto is faster than CryptoSwift in both cases.
It can be concluded that the transformation from digest to hex string actually has non-trivial impact on the performance.
Then I did a global search within swift-crypto to figure out how it handles hex strings. I found that there's an internal computed property
hexString
extended to DataProtocol which is used to generate hex strings. Since it's internal, I have to duplicate thePrettyBytes.swift
file in out project. Another test shows that thehexString
extension has no impact on the performance.Therefore, the final solution is to duplicate the
PrettyBytes.swift
file in our project to get access to thehexString
extension to solve the performance issue.All in all, my questions are:
hexString
extension be a public API to support such use case?Thanks so much!
Importance:
This may not be a serious problem in small projects. However, for huge projects like what I'm currently working on, we have really strict performance requirements. The performance impact is unacceptable after migrating from "CryptoSwift" to "swift-crypto". Hopefully, this issue can be solved soon :)
The text was updated successfully, but these errors were encountered: