Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BJ manual: wall_time_limit, number_of_processes #166

Open
mturilli opened this issue Dec 5, 2013 · 1 comment
Open

BJ manual: wall_time_limit, number_of_processes #166

mturilli opened this issue Dec 5, 2013 · 1 comment

Comments

@mturilli
Copy link

mturilli commented Dec 5, 2013

Dear BJ team,

I have some issues with the current BJ documentation. At http://saga-project.github.io/BigJob/sphinxdoc/usage/appwriting.html I read:

"wall_time_limit - Specifies the number of minutes the resources are requested for."

I tried to use 'wall_time_limit' and BJ did not honoured it. Instead, I had to use 'walltime'. Is 'wall_time_limit' correct? If so, how it differs from 'walltime'?

In the same page and here: http://saga-project.github.io/BigJob/sphinxdoc/library/index.html I read also:

"number_of_processes - This refers to the number of cores that need to be allocated to run the jobs"

Does a pilot span across multiple nodes when a number of processes greater than the number of cores of a single node have been requested? If so, is there a way to inspect a pilot so to know on how many nodes it has been scheduled and, in case, is being executed?

Many thanks,
Matteo

@melrom
Copy link
Contributor

melrom commented Dec 5, 2013

Hey Matteo, thanks. Ole mentioned that the appwriting page was pointless a few months ago. Forgot to remove it. Thanks!

As for the second question, so you request cores in multiples of nodes - therefore, if you want to run on 16 1-core jobs on Lonestar at one time, you either have to marshal 24 cores (2 nodes) for the Pilot (http://saga-project.github.io/BigJob/sphinxdoc/tutorial/table.html), or marshal just one node and they wouldn't all run at the same time, therefore, you obtain the node, start running 12 1-core jobs, when 1 finishes, you can add the next 1-core job, etc. until you run 16 cores. It probably depends on your TTC, budget, etc. which of the two you want to do.

As for knowing the number of nodes it has been scheduled, I usually use qstat, which will tell you number of slots, wherein 1 slot = 1 core - I mean, if you tried to ask for 16 cores on Lonestar (from either BJ or saga-python), you would get an error.

------------------> Rejecting job <------------------
Your slot (or core) request is not a multiple of 12.
Syntax: -pe <pe_name>
where is a multiple of 12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants