-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple example for dask-jobqueue #19
Comments
So #30 was merged, but maybe we could do a super-simple example to completement that. |
Any guidance or example on running a xarray (dask parallelized) job? I am trying to do a large bootstrap computation on a netcdf data. My cluster spawning code is following - from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(project='myproject', cores=40, processes=40, memory='160GB', walltime='10:00:00', interface='ib0', death_timeout=300)
cluster.scale(40*20) # total 800 processes
from dask.distributed import Client
client = Client(cluster)
# xarray dataload with chunk
# xarray computation
.
.
.
# data save But, when I try to do this, SLURM opens 20 separate jobs (instead of 1 slurm job with 20 nodes). Is it normal? or am I missing something in the cluster creation? To put it differently, I am trying to achieve the following job spec to run with Dask - #!/bin/bash
#SBATCH --partition=cpu_p1
#SBATCH --account=uhy@cpu
#SBATCH --job-name=ci
#SBATCH --ntasks=800
#SBATCH --ntasks-per-node=40
#SBATCH --hint=nomultithread
#SBATCH --time=10:00:00 Additionally, jeanzay manual indicates --mem does not have any effect, but for SLURMCluster, it seems to be a necessary parameter. How can I book the full memory of a node? Any guidance? Thanks. |
Great to see that there is some interest for Dask workflows on Jean-Zay!
I think this is normal.
OK I did not know this indeed: according to http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-exec_alloc-mem-eng.html the memory is computed automatically from the number of CPUs you ask. I guess if this does not have any effect you can leave it like this, if it gives you an error, you can use the Out of interest where do you run you cluster spawning script? On the login node, inside a SLURM job, somewhere else? A limitation of using Dask on Jean-Zay, is the following (there may be more details in dask/dask-jobqueue#471)
|
Hi!
Thanks for your enthusiasm. Your initial script helped me a lot to setup the process. I was getting by without distributed-cluster so far on jeanzay, but now it has gone out of hand with TBs of data to process.
Thanks for the tip. The official dask documentation is hard to follow sometime.
It seems to me that the distributor in dask needs this parameter to estimate if it can complete the job or will go out of memory. Hence I choose the highest each node at jeanzay offers. This is a guess. I will test in the coming days, and will let know here if confirmed.
I read the discussion yesterday night after facing similar issues. I end up following someones comment about spawning from a another node. Finally, what I did was submitting a job to prepost node in Jeanzay to run the python script to spawn the dask cluster nodes. This is nice because prepost node is shared, hours spent on prepost node does not get accounted into the allocation, and gives access to 2.8TB total RAM (48 total CPU). The timelimit is 20 hours there though, which does not help if you need some prepost job which takes days. |
Actually you are right the Also I forgot to say and you probably know this already but you may be interested to take a look at Pangeo which is a good place for Dask and geosciences applications. For example they have a Discourse forum |
Thanks for digging it out. I keep it in notes for ref.
Thanks, I have heard about it but I did not know about the forum. I will surely check it out. |
One other thing that you might have info on... while testing dask I have noticed some jobs with weird jobid pattern. For example -
Notice the same job id, but with suffix. Any idea how they are submitted? It seems to be they are some kind of distributed cpu load, submitted as a part of a single job, getting a single job-id, suffixed with another id for the number of machine. They awfully looks like the dask jobs, but with spatial job pattern. With dask, I am getting different job-id for each node. |
Weird those looks like SLURM job arrays (see https://crc.ku.edu/hpc/slurm/how-to/arrays for example), but I am pretty sure dask-jobqueue does not use job-arrays. I tried with a similar script as the one in the top post and I get "standard" job ids i.e. something like Maybe you have a script around dask-jobqueue that uses job arrays (sbatch Small remark, Jean-Zay people are quite security-oriented so we try our best to avoid showing info that they deem sensitive in this github repo. Such sensitive info includes your Jean-Zay logins, compute node names, and even project name. Their main argument is that this kind of info could be used for phishing. I edited your previous posts where appropriate (e.g. squeue output). |
I do not know actually. I just found them in the global job queue (not mine).
Thanks for editing it out. I do agree with this security policy. My bad. I was not thinking it through. Will keep it in mind. |
Ah OK makes sense.
No worries, this kind of things are hard to guess before you know they exist 😉 |
I think I have reached some consensus about what work for me, what does not. I have managed to do a distributed bootstrap computation for 600k times 4k data points and 10k iterations. The most node I could use was 4 (x40 processes). Going above was always quitting at some point due to some timeout at a node. I any case, your scripts and discussion here was big help to finish the work. Something I noticed on the log, which does not appear always -
Any idea where it is coming from. Some discussion at dask repo traced it back to a dependent module - dask/distributed#3389. Their suggestion is to wrap the script in |
Glad you got it working for your use case and it was useful!
Just to make sure I am following you this was happening because the jobs don't start at the same time and the first job reach it maximum time (20h on most queues) before the "total" computation (i.e. the one that is distributed) has finished. I would have thought that the Dask scheduler gets timed out first, because this is the job that start first (inside a job on a I am actually wondering whether dask-mpi would be an alternative. I am not very familiar with MPI, but my understanding is that it waits before all the nodes are ready before launching the computation. I wish using dask-jobqueue (and Dask more generally) on Jean-Zay was less cumbersome but right now this is not the case unfortunately ... suggestions on how to improve the situation more than welcome! About the I would say the I would say, as long as your computation works, you may ignore this but this may point to a problem in distributed. By the way if you want to contribute to this doc a small example with xarray on Jean-Zay, that would be appreciated 😉! |
Hello,
Here is a sample SLURM script I used on JZ:
And the
This is a scheduler agnostic approach and it allows us to have more fine grained control over how we use reserve the resources. I have tested this on JZ platform and it worked pretty well. Even if you have workflows based on |
@mahendrapaipuri looks good, if you have some spare time and want to do a Pull Request to show how to use dask-mpi on Jean-Zay, please do! |
Sure. Can do that. Where does it go? |
Yep that sounds good! |
The text was updated successfully, but these errors were encountered: