NVIDIA driver #169

RichardScottOZ · 2022-08-14T03:14:38Z

This is an odd one, unless it is the type

detect_worker_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 384, in current_device
detect_worker_1  |     _lazy_init()
detect_worker_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 186, in _lazy_init
detect_worker_1  |     _check_driver()
detect_worker_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 65, in _check_driver
detect_worker_1  |     raise AssertionError("""
detect_worker_1  | AssertionError:
detect_worker_1  | Found no NVIDIA driver on your system. Please check that you
detect_worker_1  | have an NVIDIA GPU and installed a driver from
detect_worker_1  | http://www.nvidia.com/Download/index.aspx
detect_worker_1  | distributed.process - INFO - reaping stray process <SpawnProcess name='Dask Worker process (from Nanny)' pid=15 parent=1 started daemon>
cosmos_detect_worker_1 exited with code 1
qnvida^CGracefully stopping... (press Ctrl+C again to force)
Stopping cosmos_worker_1    ...
Stopping cosmos_runner_1    ...
Stopping cosmos_scheduler_1 ...
Killing cosmos_worker_1    ... done
Killing cosmos_runner_1    ... done
Killing cosmos_scheduler_1 ... done
(tensorflow2_p38) ubuntu@ip-172-31-15-73:~/data/Cosmos$ ^C
(tensorflow2_p38) ubuntu@ip-172-31-15-73:~/data/Cosmos$ nvidia-smi
Sun Aug 14 03:07:04 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   26C    P8    16W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Used these for many things - have to try an older version sometime.

iross · 2022-08-15T18:29:41Z

Hi Richard--
Looks like nvidia-smi works fine on the host machine, but have you run it within a docker container? The issue may be in the nvidia-docker interface.

RichardScottOZ · 2022-08-15T22:03:09Z

Hi Ian,

Yeah, I was running it in docker, so that was my thought too, but thought I would ask.

If going to use it on a lot of things rather than just some tests - be better to get it running natively?

RichardScottOZ · 2022-08-15T22:25:53Z

Or will it be fine if can work out what the compose config tweaks [and additional setup necessary' might be?

e.g. whatever additional nvidia container wizardry might need to go on a generic machine setup?

RichardScottOZ · 2022-08-16T23:45:24Z

some suggestions of things like this

    deploy:
       resources:
         reservations:
           devices:
             - driver: nvidia
               count: 1
               capabilities: [gpu]

How that would tie into your not so straightforward setup there, not sure.

RichardScottOZ changed the title ~~NVIDA driver~~ NVIDIA driver Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA driver #169

NVIDIA driver #169

RichardScottOZ commented Aug 14, 2022

iross commented Aug 15, 2022

RichardScottOZ commented Aug 15, 2022

RichardScottOZ commented Aug 15, 2022

RichardScottOZ commented Aug 16, 2022

NVIDIA driver #169

NVIDIA driver #169

Comments

RichardScottOZ commented Aug 14, 2022

iross commented Aug 15, 2022

RichardScottOZ commented Aug 15, 2022

RichardScottOZ commented Aug 15, 2022

RichardScottOZ commented Aug 16, 2022