The blkhash
C library implements the block based hash construction, using
zero detection and multiple threads to speed up the computation.
The library provides 2 APIs: the simple API and the async API.
The simple API is similar to other cryptographic libraries, providing an easy to use way to create a hash, update it with data and get the digest.
Create a hash using the default options:
struct blkhash *h = blkhash_new();
Add data to the hash:
unsigned char *buf = malloc(BUF_SIZE);
memset(buf, 'A', BUF_SIZE);
blkhash_update(h, buf, BUF_SIZE);
The blkhash_update()
function uses multiple threads and zero detection
to speed up hashing.
When you know that some areas of the image are unallocated (read as zeros), you can add the unallocated range to the hash without reading anything from storage:
blkhash_zero(h, 1024 * 1024 * 1024);
This can be 3 orders of magnitude faster compared with
blkhash_update()
.
When done, finalize the hash to get the digest:
unsigned char md[BLKHASH_MAX_MD_SIZE];
unsigned int len;
blkhash_final(h, md, &len);
To free the hash use:
blkhash_free(h);
See the example program for example of using the simple API.
The async API provides the best possible performance by keeping multiple requests in flight and getting events when requests are complete, without copying the image data.
To use the async API, create a blkhash_opts
and set the queue_depth
to number of in-flight requests, and optionally the number of threads.
struct blkhash_opts *opts = blkhash_opts_new(“sha256”);
blkhash_opts_set_queue_depth(opts, 16);
blkhash_opts_set_threads(opts, 16);
Create a hash with the blkhash_opts
and free it:
struct blkhash *hash = blkhash_new_opts(opts);
blkhash_opts_free(opts);
Finally get the completion file descriptor and start watching for
events. In this example we use poll()
and we watch POLLIN
events.
poll_fds[0].fd = blkhash_completion_fd(hash);
poll_fds[0].events = POLLIN;
When a reading a buffer from storage completes, you submit the buffer to blkhash for async update:
struct request *req;
while ((req = next_completed_read())) {
req->completed = false;
blkhash_aio_update(hash, req->buffer, req->length, req);
}
blkhash_aio_update()
adds the buffer to workers queues and returns
quickly before the data is processed. You must not modify or free the
buffered passed to blkhash_aio_update()
until the async update
completes.
To find the completed request when processing completions, we pass a pointer to the request struct.
When all update request were submitted, you need to wait for update
completion. In this example we use poll()
to wait until the completion
fd becomes readable:
poll(poll_fds, ARRAY_SIZE(poll_fds), -1);
if (poll_fds[0].fd & POLLIN) {
/* Read at least 8 bytes from the completion fd. */
char sink[queue_depth < 8 ? 8 : queue_depth];
read(poll_fd[0].fd, &sink, sizeof(sink));
/* One or more requests have completed! */
process_completions();
}
Note that you must read at least 8 bytes from the completion fd. On
Linux
blkhash uses eventfd
, so reading will always return 8 bytes.
When eventfd
is not available it uses pipe
and read may return up to
queue depth bytes (set by blkhash_opts_set_queue_depth()
).
After we read the pending data from the completion fd, we need to process the completions.
We get the completions using blkhash_aio_completions()
, and then we
iterate on the completions, fetch the completed request and complete it.
struct blkhash_completion completions[queue_depth];
int count;
count = blkhash_aio_completions(hash, completions, queue_depth);
for (int i = 0; i < count; i++) {
struct request *req = &completions[i].user_data;
complete_request(req);
}
The request is the user_data
pointer we passed to
blkhash_aio_update()
before.
At this point blkhash
finished to read the data and we can free the
request or use it to read more data from storage.
blkhash_zero
is fast and does not access image data, so there is no
blkhash_aio_zero()
function. You can safely call blkhash_zero()
in
your IO event loop.
See the async example program for example of using the async API.
See blkhash(3).