Skip to content

photostructure/batch-cluster.js

Repository files navigation

batch-cluster

Efficient, concurrent work via batch-mode command-line tools from within Node.js.

npm version Build status GitHub issues CodeQL Known Vulnerabilities

Many command line tools, like ExifTool, PowerShell, and GraphicsMagick, support running in a "batch mode" that accept a series of discrete commands provided through stdin and results through stdout. As these tools can be fairly large, spinning them up can be expensive (especially on Windows).

This module allows you to run a series of commands, or Tasks, processed by a cluster of these processes.

This module manages both a queue of pending tasks, feeding processes pending tasks when they are idle, as well as monitoring the child processes for errors and crashes. Batch processes are also recycled after processing N tasks or running for N seconds, in an effort to minimize the impact of any potential memory leaks.

As of version 4, retry logic for tasks is a separate concern from this module.

This package powers exiftool-vendored, whose source you can examine as an example consumer.

Installation

Depending on your yarn/npm preference:

$ yarn add batch-cluster
# or
$ npm install --save batch-cluster

Changelog

See CHANGELOG.md.

Usage

The child process must use stdin and stdout for control/response. BatchCluster will ensure a given process is only given one task at a time.

  1. Create a singleton instance of BatchCluster.

    Note the constructor options takes a union type of

  2. The default logger writes warning and error messages to console.warn and console.error. You can change this to your logger by using setLogger or by providing a logger to the BatchCluster constructor.

  3. Implement the Parser class to parse results from your child process.

  4. Construct or extend the Task class with the desired command and the parser you built in the previous step, and submit it to your BatchCluster's enqueueTask method.

See src/test.ts for an example child process. Note that the script is designed to be flaky on order to test BatchCluster's retry and error handling code.

Caution

The default BatchClusterOptions.cleanupChildProcs value of true means that BatchCluster will try to use ps to ensure Node's view of process state are correct, and that errant processes are cleaned up.

If you run this in a docker image based off Alpine or Debian Slim, this won't work properly unless you install the procps package.

See issue #13 for details.