Fixed gpu detection for cuda rocm etc using env vars #490

bmahabirbu · 2024-11-25T11:28:58Z

I have added gpu detection functionality for cuda and fixed configurations for all gpu's using llama-cli. While working on this I also fixed how get_gpu was called. Now only calling that function once and using env vars to save states.

I have added my containerfile in docker hub as default for cuda (for testing purposes). @rhatdan is this ok to have for now or should I remove it?

@FNGarvin thanks for getting things started!

Edit:
Added separate command args for GPU support using conman_args += ["--gpus", "all"] for docker vs conman_args += ["--device", "nvidia.com/gpu=all"] for podman.

Eric I know you're on vacation so please don't take a look at this until you're back!

Summary by Sourcery

Improve GPU detection and configuration by using environment variables for CUDA, ROCm, and Asahi Linux. Refactor the GPU detection logic to streamline the process and add a default container image for CUDA on Docker Hub for testing.

Bug Fixes:

Streamlined GPU detection by using environment variables for CUDA, ROCm, and Asahi Linux.

Enhancements:

Refactor GPU detection logic to use environment variables and improve handling of different GPU types.

Deployment:

Add a default container image for CUDA on Docker Hub for testing purposes.

sourcery-ai · 2024-11-25T11:29:02Z

Reviewer's Guide by Sourcery

The PR enhances GPU detection and configuration by refactoring the code to use environment variables for GPU state management and adding CUDA support alongside existing ROCm and Asahi implementations. The changes improve GPU detection by performing a more comprehensive check across different GPU types and implementing a comparison mechanism to select the most suitable GPU when multiple types are available.

Sequence diagram for GPU detection and configuration

sequenceDiagram
    participant User
    participant System
    participant GPU_Detector
    participant Environment

    User->>System: Run command with --gpu flag
    System->>GPU_Detector: Call get_gpu()
    GPU_Detector->>System: Return GPU type and number
    System->>Environment: Set environment variable for GPU
    System->>User: Run container with GPU support

Updated class diagram for GPU detection

classDiagram
    class Model {
        +_image(args)
        +setup_container(args)
        +run_container(args, shortnames)
        +gpu_args()
    }
    class GPU_Detector {
        +get_gpu() : (String, Integer)
    }
    class Environment {
        +os.getenv(var)
        +os.environ[var] = value
    }
    Model --> GPU_Detector : uses
    Model --> Environment : uses
    note for GPU_Detector "Enhanced to detect CUDA, ROCm, and Asahi GPUs"
    note for Environment "Uses environment variables for GPU state management"

File-Level Changes

Change	Details	Files
Refactored GPU detection logic to support multiple GPU types	Added CUDA GPU detection using nvidia-smi Implemented comparison between AMD and NVIDIA GPU memory sizes Modified GPU detection to return appropriate environment variable and device number Added error handling for nvidia-smi command execution	`ramalama/model.py`
Updated container configuration to handle different GPU types	Added CUDA container image support Implemented special handling for CUDA using '--gpus all' flag Modified environment variable handling for GPU configuration Simplified GPU availability checks using environment variables	`ramalama/model.py`
Modified command execution functionality	Added text=True parameter to subprocess.run for string output	`ramalama/common.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @bmahabirbu - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider moving the hardcoded docker.io image reference to a configuration file or using the project's official repositories instead of a personal Docker Hub account

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-11-25T11:29:53Z

ramalama/model.py

+        elif nvidia_gpu_mib:
+            return "CUDA_VISIBLE_DEVICES", nvidia_gpu_num
+
+    except Exception:


issue (bug_risk): Catching all exceptions masks potential important errors

Consider catching specific exceptions (like subprocess.CalledProcessError) and logging or handling them appropriately.

I agree with soucery-ai here. We should specific errors, for example this will blow up if the command is not installed or has permission errors.

Thank you for your feedback, we will generate more comments like this in the future according to the following instructions:

- Include comments that identify potential bug risks, such as catching all exceptions, which can mask important errors. - Suggest specific improvements, like catching specific exceptions and handling them appropriately. - Ensure comments are clear and provide actionable advice.

sourcery-ai · 2024-11-25T11:29:53Z

ramalama/model.py

@@ -384,25 +392,52 @@

 def get_gpu():


issue (complexity): Consider restructuring get_gpu() into distinct phases with early returns and clearer control flow

The get_gpu() function's complexity can be reduced by restructuring it with clear phases and early returns. Here's a simplified version that maintains all functionality:

def get_gpu(): # Check Asahi first - simplest case if os.path.exists('/etc/os-release'): with open('/etc/os-release', 'r') as file: if "asahi" in file.read().lower(): return "ASAHI_VISIBLE_DEVICES", 1 # Check AMD GPUs amd_gpu_num = -1 amd_gpu_mib = 0 for i, fp in enumerate(sorted(glob.glob('/sys/bus/pci/devices/*/mem_info_vram_total'))): with open(fp, 'r') as file: bytes = int(file.read()) if bytes > 1073741824 and bytes > (amd_gpu_mib * 1048576): amd_gpu_mib = bytes / 1048576 amd_gpu_num = i # Check NVIDIA GPUs nvidia_gpu_num = -1 nvidia_gpu_mib = 0 try: output = run_cmd(['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits']) gpus = output.stdout.strip().split('\n') if gpus: gpu = max(gpus, key=lambda x: int(x.split(',')[1])) nvidia_gpu_mib = int(gpu.split(',')[1]) nvidia_gpu_num = int(gpu.split(',')[0]) except Exception: pass # Return the GPU with more memory if amd_gpu_num >= 0 and nvidia_gpu_num >= 0: return ("HIP_VISIBLE_DEVICES", amd_gpu_num) if amd_gpu_mib > nvidia_gpu_mib else ("CUDA_VISIBLE_DEVICES", nvidia_gpu_num) if amd_gpu_num >= 0: return "HIP_VISIBLE_DEVICES", amd_gpu_num if nvidia_gpu_num >= 0: return "CUDA_VISIBLE_DEVICES", nvidia_gpu_num return None, None

This restructuring:

Separates detection into clear phases

Simplifies the control flow with early returns

Makes the comparison logic more explicit

Reduces nesting depth

Maintains all existing functionality

rhatdan · 2024-11-25T14:06:31Z

ramalama/model.py

@@ -190,14 +198,14 @@ def cleanup():
        return True

    def gpu_args(self):
+        gpu_type, gpu_num = get_gpu()
        gpu_args = []
        if sys.platform == "darwin":


Should we be checking here that the app is not going to be run in a container?

I think when this code was written we ran the python script both in and outside the container, so if you were on macOS at this point, you could assume you were actually gonna run on macOS native

I would do:

If Darwin:
["-ngl", "99"]

for all cases. On macOS llama.cpp ignores -ngl 99 as it turns on acceleration by default.

rhatdan · 2024-11-25T14:07:29Z

ramalama/model.py

+
+    # Check for NVIDIA GPUs (CUDA case)
+    try:
+        command = ['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits']


We need to check if nvidia-smi is installed?

It is installed on Linux and Windows with the drivers themselves. I'm not an expert, but I am under the impression it will be present everywhere modern CUDA is usable.

Ok well as long as we catch the failure and don't segfault, I will be happy.

The final calculation at the end seems overlay complex, if amd_gpu_bytes and nvidia_gpu_mib are set to zero intially. If both are less than 1G VRAM use CPU. Otherwise just use the one with more VRAM.

We shouldn't use Nvidia GPUs either with < 1G VRAM. When the VRAM is that small it's not really worth it.

Makes sense! I'll rework the function It should be more modular to use other gpu backends (like vulkan, intel arc) down the line. I was too focused on the edge case if a system were to have both amd and nvidia GPU above 1gb

I think technically we could only have the kernel module installed outside the container and nvidia-smi installed in the container, the only thing actually required outside of the container is kernelspace stuff, aka kernel modules.

But I think it's ok like this. The detection wont be perfect ever. There will always be corner cases where one must manually specify their container somehow, I was thinking the env vars could be useful for that.

Good point! I did add some more modularity for now.

I'm thinking down the road I can query the vulkan SDK for info for all gpu architectures and select from there

bmahabirbu · 2024-11-26T00:43:43Z

I also reverted run_cmd args text=True as it was breaking ramalama list functionality and appropriately updated get_gpu()

rhatdan · 2024-11-26T20:21:21Z

Want to get @ericcurtin approval.

FNGarvin · 2024-11-28T04:57:56Z

I'm the FNG and don't want to tip any canoes, but could I entreat your good graces to entertain a couple of questions?

now one would have to use the --gpu flag like this ramalama --gpu run llama3 to activate ramalama with GPU support.

I'm not volunteering to rework the code, but is it worse to call get_gpu twice at the expense of a few microseconds than it is to saddle the user seeking "boring AI" via a project whose mission objective is to inspect "your system for GPU support, falling back to CPU support if no GPUs are present"? As a user, I would prefer to see command-line arguments removed instead of added. Like, maybe defaulting to a particular model and possibly even eliding the run command. If the Ramalama command-line is complex enough that I have to create novel aliases to run the scaffolding whose primary task (AFAICT) is to replace the need for novel aliases... maybe I'm missing something.

I'll drop a link to this discussion, too: #478 (comment) , because it would not feel great to have this patch break my ability to use the software.

Happy holidays, folks.

BR,
FNG

bmahabirbu · 2024-11-28T06:42:32Z

@FNGarvin I agree that the end user should not be dealing with unnecessary additional args. The initial implication of that design choice was to build off of an existing framework. Thank you for reeling back the intent for the design. It's a simple fix and has little to do with performance as get_gpu() still can be called once. Env vars are set with the data from get_gpu() for additional calls that need the info.

Additionally, the idea to omit "run" is an interesting one. Something to consider as we move forward. Having args such as run and serve IMO makes people who use podman/docker feel at home with Ramalam. But it would be nice to have a default model.

In that discussion I mentioned how to get podman to work with --gpus all. However, a user should not need to fiddle with additional configurations to get things working. It's a bummer that nvidia-container-toolkit needs to be installed to get things working and have additional configurations to have the same args as docker.

We could go back to having separate commands for docker and podman but I'd like to first test the ability to share the nvidia libraries to the container runtimes manually to consolidate the args and adhere to the idea that we can switch between docker and podman seamlessly.

Happy holidays!

Signed-off-by: Brian <[email protected]>

rhatdan · 2024-12-17T18:40:14Z

@bmahabirbu needs a rebase. @ericcurtin What should we do with this one?

ericcurtin · 2024-12-17T19:03:42Z

Not sure, was waiting until this build goes green and conflicts are fixed to look at this again.

At this point it might be easier to rewrite this, copying out the bits we need.

sourcery-ai bot reviewed Nov 25, 2024

View reviewed changes

bmahabirbu force-pushed the nv-detect branch from a7a8309 to 69e8acb Compare November 25, 2024 11:37

rhatdan reviewed Nov 25, 2024

View reviewed changes

bmahabirbu force-pushed the nv-detect branch from 69e8acb to 2d5c1a5 Compare November 26, 2024 00:40

bmahabirbu force-pushed the nv-detect branch 2 times, most recently from 234c4af to e60a429 Compare December 2, 2024 09:30

Fixed gpu detection for cuda rocm etc using env vars

2f57a54

Signed-off-by: Brian <[email protected]>

bmahabirbu force-pushed the nv-detect branch from e60a429 to 2f57a54 Compare December 2, 2024 09:55

FNGarvin mentioned this pull request Dec 3, 2024

Update model.py to enable CUDA when available #478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed gpu detection for cuda rocm etc using env vars #490

Fixed gpu detection for cuda rocm etc using env vars #490

bmahabirbu commented Nov 25, 2024 •

edited

Loading

sourcery-ai bot commented Nov 25, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Nov 25, 2024

rhatdan Nov 25, 2024

sourcery-ai bot Nov 25, 2024

sourcery-ai bot Nov 25, 2024

rhatdan Nov 25, 2024

ericcurtin Nov 25, 2024 •

edited

Loading

rhatdan Nov 25, 2024

FNGarvin Nov 25, 2024

rhatdan Nov 25, 2024

ericcurtin Nov 25, 2024

ericcurtin Nov 25, 2024

bmahabirbu Nov 25, 2024 •

edited

Loading

ericcurtin Nov 25, 2024

bmahabirbu Nov 26, 2024

bmahabirbu commented Nov 26, 2024

rhatdan commented Nov 26, 2024

FNGarvin commented Nov 28, 2024

bmahabirbu commented Nov 28, 2024

rhatdan commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

Fixed gpu detection for cuda rocm etc using env vars #490

Are you sure you want to change the base?

Fixed gpu detection for cuda rocm etc using env vars #490

Conversation

bmahabirbu commented Nov 25, 2024 • edited Loading

Summary by Sourcery

sourcery-ai bot commented Nov 25, 2024 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for GPU detection and configuration

Updated class diagram for GPU detection

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcery-ai bot Nov 25, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericcurtin Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmahabirbu Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmahabirbu commented Nov 26, 2024

rhatdan commented Nov 26, 2024

FNGarvin commented Nov 28, 2024

bmahabirbu commented Nov 28, 2024

rhatdan commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

bmahabirbu commented Nov 25, 2024 •

edited

Loading

sourcery-ai bot commented Nov 25, 2024 •

edited

Loading

ericcurtin Nov 25, 2024 •

edited

Loading

bmahabirbu Nov 25, 2024 •

edited

Loading