BJ manual: wall_time_limit, number_of_processes #166

mturilli · 2013-12-05T13:30:03Z

Dear BJ team,

I have some issues with the current BJ documentation. At http://saga-project.github.io/BigJob/sphinxdoc/usage/appwriting.html I read:

"wall_time_limit - Specifies the number of minutes the resources are requested for."

I tried to use 'wall_time_limit' and BJ did not honoured it. Instead, I had to use 'walltime'. Is 'wall_time_limit' correct? If so, how it differs from 'walltime'?

In the same page and here: http://saga-project.github.io/BigJob/sphinxdoc/library/index.html I read also:

"number_of_processes - This refers to the number of cores that need to be allocated to run the jobs"

Does a pilot span across multiple nodes when a number of processes greater than the number of cores of a single node have been requested? If so, is there a way to inspect a pilot so to know on how many nodes it has been scheduled and, in case, is being executed?

Many thanks,
Matteo

melrom · 2013-12-05T20:06:04Z

Hey Matteo, thanks. Ole mentioned that the appwriting page was pointless a few months ago. Forgot to remove it. Thanks!

As for the second question, so you request cores in multiples of nodes - therefore, if you want to run on 16 1-core jobs on Lonestar at one time, you either have to marshal 24 cores (2 nodes) for the Pilot (http://saga-project.github.io/BigJob/sphinxdoc/tutorial/table.html), or marshal just one node and they wouldn't all run at the same time, therefore, you obtain the node, start running 12 1-core jobs, when 1 finishes, you can add the next 1-core job, etc. until you run 16 cores. It probably depends on your TTC, budget, etc. which of the two you want to do.

As for knowing the number of nodes it has been scheduled, I usually use qstat, which will tell you number of slots, wherein 1 slot = 1 core - I mean, if you tried to ask for 16 cores on Lonestar (from either BJ or saga-python), you would get an error.

------------------> Rejecting job <------------------
Your slot (or core) request is not a multiple of 12.
Syntax: -pe <pe_name>
where is a multiple of 12.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BJ manual: wall_time_limit, number_of_processes #166

BJ manual: wall_time_limit, number_of_processes #166

mturilli commented Dec 5, 2013

melrom commented Dec 5, 2013

BJ manual: wall_time_limit, number_of_processes #166

BJ manual: wall_time_limit, number_of_processes #166

Comments

mturilli commented Dec 5, 2013

melrom commented Dec 5, 2013