From a59ce77c66f7e0a861a36aef63bdbc93a8784027 Mon Sep 17 00:00:00 2001 From: Neil Douglas Date: Tue, 2 Apr 2024 13:37:26 +0100 Subject: [PATCH] add FAQ about jobs not starting --- docs/source/faq/faq.rst | 14 ++++++++++++++ docs/source/using_viking/submitting_jobs.rst | 2 ++ 2 files changed, 16 insertions(+) diff --git a/docs/source/faq/faq.rst b/docs/source/faq/faq.rst index cfb06f3..d435505 100644 --- a/docs/source/faq/faq.rst +++ b/docs/source/faq/faq.rst @@ -57,3 +57,17 @@ Or if you're using the `srun `_ or `salloc --exclude=node123 -x node123 + + +Why hasn't my job started? +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Jobs submitted to the Slurm job scheduler will sometimes take some time before they start running. This can be for a number of reasons for example how busy the particular partition is that the job was submitted to or how many resources the job is requesting. It's always a good idea to request only the :ref:`resources ` your job requires. You can check the jobs you have in the queue with the following command: + +.. code-block:: console + + $ squeue -u $USER + +If you see the reason for a job being held as ``QOSGrpGRES`` then this means a resource has reached its limit for that partition. For example, on the ``gpu_week`` partition only a total of three GPUs are allowed to be used by all users at the same time (on that particular partition). When this limit is reached all new jobs to the queue will be held with that reason code. + +For more information there is a full list of `reason codes `_. diff --git a/docs/source/using_viking/submitting_jobs.rst b/docs/source/using_viking/submitting_jobs.rst index b64e59f..ef3c699 100644 --- a/docs/source/using_viking/submitting_jobs.rst +++ b/docs/source/using_viking/submitting_jobs.rst @@ -73,6 +73,8 @@ If you filled out the ``--mail-user`` option you will get an email when the job Tips and best practices ----------------------- +.. _job_resources: + Resource requests ^^^^^^^^^^^^^^^^^