From 9e4b419c4ff9de306e64fa214897b9835c9c9282 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jakub=20Ber=C3=A1nek?= Date: Mon, 4 Dec 2023 13:51:59 +0100 Subject: [PATCH] Document instance IDs and placeholder usage better --- docs/jobs/failure.md | 12 ++++++++---- docs/jobs/jobs.md | 37 ++++++++++++++++++++++--------------- 2 files changed, 30 insertions(+), 19 deletions(-) diff --git a/docs/jobs/failure.md b/docs/jobs/failure.md index 42eeaafaf..26ad1a686 100644 --- a/docs/jobs/failure.md +++ b/docs/jobs/failure.md @@ -26,11 +26,15 @@ Sometimes a worker might crash while it is executing some task. In that case the different worker and the task will begin executing from the beginning. In order to let the executed application know that the same task is being executed repeatedly, HyperQueue assigns each -execution a separate **Instance id**. It is a 32b non-negative number that identifies each (re-)execution of a task. +execution a separate **Instance ID**. It is a 32b non-negative number that identifies each (re-)execution of a task. -It is guaranteed that a newer execution of a task will have a larger instance id, however HyperQueue explicitly -**does not** guarantee any specific values or differences between two ids. Each instance id is valid only for a particular -task. Two different tasks may have the same instance id. +It is guaranteed that a newer execution of a task will have a larger instance ID, however HyperQueue explicitly +**does not** guarantee any specific values or differences between two IDs. Each instance ID is valid only for a particular +task. Two different tasks may have the same instance ID. + +Instance IDs can be useful e.g. when a task is restarted, and you want to distinguish the output of the first execution +and the restarted execution (by default, HQ will overwrite the standard output/error file of the first execution). You +can instead create a separate stdout/stderr file for each task execution using the [instance ID placeholder](jobs.md#placeholders). ## Task array failures By default, when a single task of a [task array](arrays.md) fails, the computation of the job will continue. diff --git a/docs/jobs/jobs.md b/docs/jobs/jobs.md index 81aa1fcb2..786cd4440 100644 --- a/docs/jobs/jobs.md +++ b/docs/jobs/jobs.md @@ -110,12 +110,12 @@ $ hq submit --env KEY1=VAL1 --env KEY2=VAL2 ... Each executed task will also automatically receive the following environment variables: -| Variable name | Explanation | -|------------------|-------------------------------------------------------------------| -| `HQ_JOB_ID` | Job id | -| `HQ_TASK_ID` | Task id | -| `HQ_INSTANCE_ID` | [Instance id](failure.md#task-restart) | -| `HQ_RESOURCE_...`| A set of variables related to allocated [resources](resources.md) | +| Variable name | Explanation | +|-------------------|-------------------------------------------------------------------| +| `HQ_JOB_ID` | Job id | +| `HQ_TASK_ID` | Task id | +| `HQ_INSTANCE_ID` | [Instance id](failure.md#task-restart) | +| `HQ_RESOURCE_...` | A set of variables related to allocated [resources](resources.md) | ### Time management You can specify two time-related parameters when submitting a job. They will be applied to each task of the submitted job. @@ -186,16 +186,23 @@ Placeholders are enclosed in curly braces (`{}`) and prefixed with a percent (`% You can use the following placeholders: -| Placeholder | Will be replaced by | Available for | -|------------------|-----------------------------------------------|----------------------------------| -| `%{JOB_ID}` | Job ID | `stdout`, `stderr`, `cwd`, `log` | -| `%{TASK_ID}` | Task ID | `stdout`, `stderr`, `cwd` | -| `%{INSTANCE_ID}` | [Instance ID](failure.md#task-restart) | `stdout`, `stderr`, `cwd` | -| `%{SUBMIT_DIR}` | Directory from which the job was submitted. | `stdout`, `stderr`, `cwd`, `log` | -| `%{CWD}` | Working directory of the task. | `stdout`, `stderr` | -| `%{SERVER_UID}` | Server unique ID (a string of length 6)[^uid] | `stdout`, `stderr`, `cwd`, `log` | +| Placeholder | Will be replaced by | Available for | +|------------------|---------------------------------------------|----------------------------------| +| `%{JOB_ID}` | Job ID | `stdout`, `stderr`, `cwd`, `log` | +| `%{TASK_ID}` | Task ID | `stdout`, `stderr`, `cwd` | +| `%{INSTANCE_ID}` | [Instance ID](failure.md#task-restart) | `stdout`, `stderr`, `cwd` | +| `%{SUBMIT_DIR}` | Directory from which the job was submitted. | `stdout`, `stderr`, `cwd`, `log` | +| `%{CWD}` | Working directory of the task. | `stdout`, `stderr` | +| `%{SERVER_UID}` | Unique server ID. | `stdout`, `stderr`, `cwd`, `log` | -[^uid] Server generates a random `SERVER_UID` string every time a new server is started (`hq server start`). +`SERVER_UID` is a random string that is unique for each new server execution (each `hq server start` gets a separate value). + +As an example, if you wanted to include the [Instance ID](failure.md#task-restart) in the `stdout` path (to +distinguish the individual outputs of restarted tasks), you can use placeholders like this: + +```bash +$ hq submit --stdout '%{CWD}/job-%{JOB_ID}/%{TASK_ID}-%{INSTANCE_ID}.stdout' ... +``` ## State At any moment in time, each task and job has a specific *state* that represents what is currently happening to it. You