This is meant to be a replacement for GNU parallel written in Go. This started as a learning exercise in dealing with parallelism in Go, but has since become a tool that I regularly use.
The tool will start a worker for each CPU and work through the list of jobs that you give it. The amount of workers is configurable.
This tool is striving to only use stdlib packages.
Install using go get github.com/mylanconnolly/parallel
or some other means.
The most straightforward usage would be:
# Want to calculate the MD5 sum of every file in /etc?
$ find /etc -type f | parallel md5sum
# Want to only use two workers for the same thing?
$ find /etc -type f | parallel -j 2 md5sum
You can utilize Go templates when performing a command using the -t
flag. When
using the -t
flag, you do not need to specify the command (it will be ignored
if you do).
The following fields are available when using templates:
Field | Definition |
---|---|
{{.Cmd}} |
The path of the command specified, for example echo or md5sum |
{{.Input}} |
The current input that we received via stdin or input file |
{{.Start}} |
The time that parallel was started |
{{.Time}} |
The time that the current operation began |
In addition, the following functions are available in templates:
Function | Help |
---|---|
toUpper |
Transform the string to uppercase |
toLower |
Transform the string to lowercase |
absolutePath |
Get the absolute path of a filename |
basename |
Get the basename of a file path |
dirname |
Get the directory of a file path |
ext |
Get the extension of a file |
noExt |
Get the file path without an extension |
Some examples below:
# Copy some files up a level (utilizing template pipelines).
parallel -a ./files.txt -t 'cp {{.Input}} {{.Input | dirname | dirname}}'
# Create a directory named after the file (without extension).
parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{noExt .Input}}'
# Echo the base name of the file without the extension (utilizing template
# pipelines).
parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{.Input | basename | noExt}}'
For more general information about Go templates, check here.
Here are some benchmarks using the time
command. The benchmark I put together
is to run md5sum
for every file in the Go source repository as of commit
14bec27743.
Below is the timing for the GNU version:
$ time find ~/src/go -type f | parallel md5sum > /dev/null
noglob find ~/src/go -type f 0.01s user 0.07s system 0% cpu 22.580 total
parallel md5sum > /dev/null 22.65s user 42.48s system 246% cpu 26.432 total
Below is the timing for this version:
$ time find ~/src/go -type f | ./parallel md5sum > /dev/null
noglob find ~/src/go -type f 0.02s user 0.05s system 3% cpu 1.845 total
./parallel md5sum > /dev/null 7.46s user 2.72s system 396% cpu 2.569 total
In this example it took GNU parallel around 10 times longer to complete the same amount of work.
A few notes on my test environment:
- Thinkpad A485
- AMD Ryzen Pro 2700U
- 16GB of RAM
- 256GB NVMe SSD (though I believe it might be a pretty low-quality one)
- Ubuntu 20.04 LTS (kernel version 5.4.0-21-generic)