Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guide for vscode with slurm #67

Merged
merged 31 commits into from
Nov 21, 2024
Merged

Add guide for vscode with slurm #67

merged 31 commits into from
Nov 21, 2024

Conversation

lauraporta
Copy link
Member

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
We've been exploring optimal ways to use VSCode within a SLURM job to ensure appropriate use of assigned resources. After testing a solution suggested in our internal Slack, code tunnel, I found it to be effective. This approach is straightforward, user-friendly, and fully SLURM-managed, allowing resources to be properly allocated and controlled within the job environment.

Happy to hear your thoughts about it!

What does this PR do?
Adds a markdown file with the guide.

References

#62 issue discussion

How has this PR been tested?

  • Docs test: docs built locally
  • Resource management tests:
    • I can successfully kill the job and be disconnected from vscode

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@lauraporta
Copy link
Member Author

Working on failing actions

Copy link
Member

@adamltyson adamltyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. TBH I haven't tested it because I don't use VSCode. I thought @niksirbi might be kind enough to give it a spin 🚗.

docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
@niksirbi
Copy link
Member

Looks good. TBH I haven't tested it because I don't use VSCode. I thought @niksirbi might be kind enough to give it a spin 🚗.

I'm looking forward to reviewing it as soon as I find the time.

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find @lauraporta!

As far as I can tell this works as intended, and is the most hussle-free method I've tried so far and the only pre-requisite is a github account.

I could also login with my VScode account, sync all my extensions, settings etc, all working seamlessly!

I also ran some jupyter notebook within VSCode, and everything worked perfectly well, including plots.

SLURM killed my jobs at the specified time limit, so we shouldn't be eating more resources than allocated.

All in all, 10/10!

It may be worth announcing this on the institute's slack as the "recommended" way to use VSCode on the cluster (after consulting with the scientific computing team perhaps).

docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
@sfmig
Copy link
Collaborator

sfmig commented Nov 21, 2024

Heya,
Nice find! ✨

I was just having a go now after the positive reviews but unfortunately couldn't make it work for me - it stays forever in the VSCode webapp saying that it is connecting to host.

I followed the instructions in the guide, except that I requested a node with a gpu in the gpu partition (rather than in fast):

srun -p gpu --gres=gpu:1 --pty bash -i

I seemed to get into the compute node just fine, then all the authentication went ok and the webapp launched, but it didn't move pass that.

I tried a few times with no success, and later saw that the SLURM jobs appear as FAILED:
image

I wonder if other people experienced this?

@sfmig
Copy link
Collaborator

sfmig commented Nov 21, 2024

And just two small comments on the guide:

  • if you run code tunnel from the VSCode terminal, you get an error. Maybe we should clarify one should run this in a separate terminal?
  • From the docs it seems that it eliminates the need to copy local code in the compute node. I haven't been able to check this, but is that correct? if so, should we highlight this as the main benefit? I didn't get that from reading the guide alone and it seems like its main perk.

@niksirbi
Copy link
Member

niksirbi commented Nov 21, 2024

I was just having a go now after the positive reviews but unfortunately couldn't make it work for me - it stays forever in the VSCode webapp saying that it is connecting to host.

Strange. I wonder if the resources given to the compute node "by default" are not sufficient for whatever VSCode does when tunneling. What happens if you try:

srun -p gpu -n 4 --mem 8G --gres=gpu:1 --pty bash -i

Just a random guess, not sure if this is indeed your problem.

@sfmig
Copy link
Collaborator

sfmig commented Nov 21, 2024

If you want to use your own VScode app rather than the webapp, you can do this (from this blog post):

  1. ssh to hpc-gw1
  2. request an interactive compute node
  3. run code tunnel in the compute node (& complete authentication if required)
  4. when asked to follow a link that ends with a device name, go instead to your vscode app, open the command palette (in mac, cmd+shift+p) and search "Remote tunnels: connect to tunnel". Then a dropdown shows up with the existing tunnels and you can select the desired one.
    • alternatively, you can go to the Remotes tab (the same one as for ssh connections) and the running tunnels should show up there.

I haven't been able to verify this 100% because it also lags forever (but at least it behaves the same 😅 ) it works now! 🥳

Presumably this allows you to have all your vscode extensions, settings etc, even if you don't have a VSCode account.

@niksirbi
Copy link
Member

If you want to use your own VScode app rather than the webapp, it seems you can do this (from this blog post):

It works! Nice find @sfmig, this is even better than suing the web app!

@sfmig
Copy link
Collaborator

sfmig commented Nov 21, 2024

It works!
yay!

I can confirm the steps above now also work for me with the two srun commands we discussed

@lauraporta
Copy link
Member Author

if you run code tunnel from the VSCode terminal, you get an error. Maybe we should clarify one should run this in a separate terminal?

Definitely yes.

From the docs it seems that it eliminates the need to copy local code in the compute node. I haven't been able to check this, but is that correct? if so, should we highlight this as the main benefit? I didn't get that from reading the guide alone and it seems like its main perk.

Interesting. Reading the highlighted section of the guide you point to, I think it means that we are not running source code on our local client (in this case the browser) but instead on the compute node in which code tunnel is run.

If you want to use your own VScode app rather than the webapp...

Going to test this as well!

@lauraporta
Copy link
Member Author

✨ A M A Z I N G ✨

I just didn't have the Tunnel extension installed. As the tunnel is maintained by the same kind of slurm job, it should be killed in the same way. Going to include this solution.

Anyway is anyone getting the notification "Unable to watch for changes"?
Maybe it is just my repo.

@sfmig
Copy link
Collaborator

sfmig commented Nov 21, 2024

I think it means that we are not running source code on our local client (in this case the browser) but instead on the compute node in which code tunnel is run.

Aaah I see! Sorry I wasn't sure who the client was 😅

@sfmig
Copy link
Collaborator

sfmig commented Nov 21, 2024

A somewhat equivalent approach using ssh would be the well-known hack:

  1. ssh to hpc-gw1 using VScode Remote extension
    • (this is just to get the same UI experience as the tunnel)
  2. request an interactive node with srun
    • suppose we get gpu-380-11
  3. ssh to the assigned node by adding a SSH remote connection in VScode and typing :
    ssh -J hpc-gw1 sminano@gpu-380-11
    • use -J to indicate a proxy jump via the gateway node hpc-gw1
    • the node gets added as a host to your local config 🙀

This works for me at the moment.

The problem of this approach is that the SLURM limits assigned to the user in the srun command are not maintained when you ssh to the compute node. Is that correct?

The benefit of the code tunnel approach would then be:

  • it's easier to set up, and
  • we think it respects the SLURM limits (or do we know this for sure?)

@lauraporta
Copy link
Member Author

The benefit of the code tunnel approach would then be:
it's easier to set up, and

Yes

we think it respects the SLURM limits (or do we know this for sure?)

We've tested for time limit and manually killing the job. Unsure about resources in the same node while the job is running

@niksirbi
Copy link
Member

sorry, I wrote the opposite of what I was thinking, it did not get killed, sorry to add to the confusion 🫠 - edited now

Phew, a bit of confidence in my own understanding of this restored.

@lauraporta
Copy link
Member Author

lauraporta commented Nov 21, 2024

Apart of SLURM (Scheduling Like Unpredictable Resource Magic) questions, what do you think of the updated guide? @sfmig @niksirbi

@niksirbi
Copy link
Member

Thanks for updating it @lauraporta. I'm happy with the content, just added some comments on wording and rendering.

Copy link
Collaborator

@sfmig sfmig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a great find, thanks @lauraporta 🚀

Just small things such as some vscode ->VSCode and slurm -> SLURM. Nothing major, feel free to take or leave as you please!

docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
docs/source/programming/vscode-with-slurm-job.md Outdated Show resolved Hide resolved
@lauraporta lauraporta merged commit 8f0fe7b into main Nov 21, 2024
3 checks passed
@sfmig sfmig deleted the vscode-compute-nodes branch November 21, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants