-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Options for CUDA, podman and docker updated with nvidia-container sup… #896
base: main
Are you sure you want to change the base?
Conversation
|
||
config = mkIf cfg.enable { | ||
#Enabling CUDA on any supported system requires below settings. | ||
nixpkgs.config.allowUnfree = lib.mkForce true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we not pass in a high-level nixpkgs instance or is this false memory?
In that case nixpkgs.config wouldn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think if there is no error, than the answer is no.
…port with cdi fix for docker Signed-off-by: Emrah Billur <[email protected]>
Finally the docker nvidia container issue is solved with cdi devices forcing. Only a single issue left with cross compilation of libnvidia-container where the compile option -m64 fails with -from-x86_64 builds. |
@@ -0,0 +1,24 @@ | |||
# Copyright 2022-2024 TII (SSRC) and the Ghaf contributors | |||
# SPDX-License-Identifier: Apache-2.0 | |||
{ lib, config, ... }: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: { config, lib, ... } for consistency with other modules
Description of changes
Configuration options below:
Both docker and podman can coexist together as an option.
Planned to be removed or moved to work inside vm's later but this option is required for nvidia containers and ML software.
Checklist for things done
x86_64
aarch64
riscv64
make-checks
and it passesnixos-rebuild ... switch
Instructions for Testing
For Docker:
In your ghaf configuration (preferably your flake-module.nix of your target platform) add
ghaf.virtualization.docker.daemon.enable=true;
rebuild your config
For x86_64 platforms:
sudo docker run --rm --device=nvidia.com/gpu=all ubuntu nvidia-smi
and the top part of your output will be like+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
where second line shows CUDA version.
For nvidia jetson platforms we don't have nvidia-smi so we use python torch:
sudo docker run -it --rm --device=nvidia.com/gpu=all --network host nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
and from the docker shell you run:python3 -c 'import torch; print(torch.cuda.is_available())'
and expecting outputTrue
For Podman:
In your ghaf configuration (preferably your flake-module.nix of your target platform) add
ghaf.virtualization.podman.daemon.enable=true;
rebuild your config
For x86_64 platforms:
sudo podman run --rm --device=nvidia.com/gpu=all ubuntu nvidia-smi
and the top part of your output will be like+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
where second line shows CUDA version.
For nvidia jetson platforms we don't have nvidia-smi so we use python torch:
sudo podman run -it --rm --device=nvidia.com/gpu=all nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3 /bin/bash
and from the docker shell you run:python3 -c 'import torch; print(torch.cuda.is_available())'
and expecting outputTrue
Note: You can test podman with docker commands as podman has docker compatibility option (will not work in case of having both docker and podman daemons together.
Docker daemon was already in ghaf but nvidia containers and cuda support is also added to docker daemon. Podman is a new feature.