Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed gpu detection for cuda rocm etc using env vars #490

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bmahabirbu
Copy link
Collaborator

@bmahabirbu bmahabirbu commented Nov 25, 2024

I have added gpu detection functionality for cuda and fixed configurations for all gpu's using llama-cli. While working on this I also fixed how get_gpu was called. Now only calling that function once and using env vars to save states.

I have added my containerfile in docker hub as default for cuda (for testing purposes). @rhatdan is this ok to have for now or should I remove it?

@FNGarvin thanks for getting things started!

Edit:
Added separate command args for GPU support using conman_args += ["--gpus", "all"] for docker vs conman_args += ["--device", "nvidia.com/gpu=all"] for podman.

Eric I know you're on vacation so please don't take a look at this until you're back!

Summary by Sourcery

Improve GPU detection and configuration by using environment variables for CUDA, ROCm, and Asahi Linux. Refactor the GPU detection logic to streamline the process and add a default container image for CUDA on Docker Hub for testing.

Bug Fixes:

  • Streamlined GPU detection by using environment variables for CUDA, ROCm, and Asahi Linux.

Enhancements:

  • Refactor GPU detection logic to use environment variables and improve handling of different GPU types.

Deployment:

  • Add a default container image for CUDA on Docker Hub for testing purposes.

Copy link
Contributor

sourcery-ai bot commented Nov 25, 2024

Reviewer's Guide by Sourcery

The PR enhances GPU detection and configuration by refactoring the code to use environment variables for GPU state management and adding CUDA support alongside existing ROCm and Asahi implementations. The changes improve GPU detection by performing a more comprehensive check across different GPU types and implementing a comparison mechanism to select the most suitable GPU when multiple types are available.

Sequence diagram for GPU detection and configuration

sequenceDiagram
    participant User
    participant System
    participant GPU_Detector
    participant Environment

    User->>System: Run command with --gpu flag
    System->>GPU_Detector: Call get_gpu()
    GPU_Detector->>System: Return GPU type and number
    System->>Environment: Set environment variable for GPU
    System->>User: Run container with GPU support
Loading

Updated class diagram for GPU detection

classDiagram
    class Model {
        +_image(args)
        +setup_container(args)
        +run_container(args, shortnames)
        +gpu_args()
    }
    class GPU_Detector {
        +get_gpu() : (String, Integer)
    }
    class Environment {
        +os.getenv(var)
        +os.environ[var] = value
    }
    Model --> GPU_Detector : uses
    Model --> Environment : uses
    note for GPU_Detector "Enhanced to detect CUDA, ROCm, and Asahi GPUs"
    note for Environment "Uses environment variables for GPU state management"
Loading

File-Level Changes

Change Details Files
Refactored GPU detection logic to support multiple GPU types
  • Added CUDA GPU detection using nvidia-smi
  • Implemented comparison between AMD and NVIDIA GPU memory sizes
  • Modified GPU detection to return appropriate environment variable and device number
  • Added error handling for nvidia-smi command execution
ramalama/model.py
Updated container configuration to handle different GPU types
  • Added CUDA container image support
  • Implemented special handling for CUDA using '--gpus all' flag
  • Modified environment variable handling for GPU configuration
  • Simplified GPU availability checks using environment variables
ramalama/model.py
Modified command execution functionality
  • Added text=True parameter to subprocess.run for string output
ramalama/common.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bmahabirbu - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider moving the hardcoded docker.io image reference to a configuration file or using the project's official repositories instead of a personal Docker Hub account
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

elif nvidia_gpu_mib:
return "CUDA_VISIBLE_DEVICES", nvidia_gpu_num

except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Catching all exceptions masks potential important errors

Consider catching specific exceptions (like subprocess.CalledProcessError) and logging or handling them appropriately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with soucery-ai here. We should specific errors, for example this will blow up if the command is not installed or has permission errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback, we will generate more comments like this in the future according to the following instructions:

- Include comments that identify potential bug risks, such as catching all exceptions, which can mask important errors.
- Suggest specific improvements, like catching specific exceptions and handling them appropriately.
- Ensure comments are clear and provide actionable advice.

@@ -384,25 +392,52 @@

def get_gpu():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider restructuring get_gpu() into distinct phases with early returns and clearer control flow

The get_gpu() function's complexity can be reduced by restructuring it with clear phases and early returns. Here's a simplified version that maintains all functionality:

def get_gpu():
    # Check Asahi first - simplest case
    if os.path.exists('/etc/os-release'):
        with open('/etc/os-release', 'r') as file:
            if "asahi" in file.read().lower():
                return "ASAHI_VISIBLE_DEVICES", 1

    # Check AMD GPUs
    amd_gpu_num = -1
    amd_gpu_mib = 0
    for i, fp in enumerate(sorted(glob.glob('/sys/bus/pci/devices/*/mem_info_vram_total'))):
        with open(fp, 'r') as file:
            bytes = int(file.read())
            if bytes > 1073741824 and bytes > (amd_gpu_mib * 1048576):
                amd_gpu_mib = bytes / 1048576
                amd_gpu_num = i

    # Check NVIDIA GPUs
    nvidia_gpu_num = -1
    nvidia_gpu_mib = 0
    try:
        output = run_cmd(['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits'])
        gpus = output.stdout.strip().split('\n')
        if gpus:
            gpu = max(gpus, key=lambda x: int(x.split(',')[1]))
            nvidia_gpu_mib = int(gpu.split(',')[1])
            nvidia_gpu_num = int(gpu.split(',')[0])
    except Exception:
        pass

    # Return the GPU with more memory
    if amd_gpu_num >= 0 and nvidia_gpu_num >= 0:
        return ("HIP_VISIBLE_DEVICES", amd_gpu_num) if amd_gpu_mib > nvidia_gpu_mib else ("CUDA_VISIBLE_DEVICES", nvidia_gpu_num)
    if amd_gpu_num >= 0:
        return "HIP_VISIBLE_DEVICES", amd_gpu_num
    if nvidia_gpu_num >= 0:
        return "CUDA_VISIBLE_DEVICES", nvidia_gpu_num

    return None, None

This restructuring:

  1. Separates detection into clear phases
  2. Simplifies the control flow with early returns
  3. Makes the comparison logic more explicit
  4. Reduces nesting depth
  5. Maintains all existing functionality

@@ -190,14 +198,14 @@ def cleanup():
return True

def gpu_args(self):
gpu_type, gpu_num = get_gpu()
gpu_args = []
if sys.platform == "darwin":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be checking here that the app is not going to be run in a container?

Copy link
Collaborator

@ericcurtin ericcurtin Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when this code was written we ran the python script both in and outside the container, so if you were on macOS at this point, you could assume you were actually gonna run on macOS native

I would do:

If Darwin:
["-ngl", "99"]

for all cases. On macOS llama.cpp ignores -ngl 99 as it turns on acceleration by default.


# Check for NVIDIA GPUs (CUDA case)
try:
command = ['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if nvidia-smi is installed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is installed on Linux and Windows with the drivers themselves. I'm not an expert, but I am under the impression it will be present everywhere modern CUDA is usable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok well as long as we catch the failure and don't segfault, I will be happy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final calculation at the end seems overlay complex, if amd_gpu_bytes and nvidia_gpu_mib are set to zero intially. If both are less than 1G VRAM use CPU. Otherwise just use the one with more VRAM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use Nvidia GPUs either with < 1G VRAM. When the VRAM is that small it's not really worth it.

Copy link
Collaborator Author

@bmahabirbu bmahabirbu Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I'll rework the function It should be more modular to use other gpu backends (like vulkan, intel arc) down the line. I was too focused on the edge case if a system were to have both amd and nvidia GPU above 1gb

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think technically we could only have the kernel module installed outside the container and nvidia-smi installed in the container, the only thing actually required outside of the container is kernelspace stuff, aka kernel modules.

But I think it's ok like this. The detection wont be perfect ever. There will always be corner cases where one must manually specify their container somehow, I was thinking the env vars could be useful for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I did add some more modularity for now.

I'm thinking down the road I can query the vulkan SDK for info for all gpu architectures and select from there

@bmahabirbu
Copy link
Collaborator Author

I also reverted run_cmd args text=True as it was breaking ramalama list functionality and appropriately updated get_gpu()

@rhatdan
Copy link
Member

rhatdan commented Nov 26, 2024

Want to get @ericcurtin approval.

@FNGarvin
Copy link
Contributor

I'm the FNG and don't want to tip any canoes, but could I entreat your good graces to entertain a couple of questions?

now one would have to use the --gpu flag like this ramalama --gpu run llama3 to activate ramalama with GPU support.

I'm not volunteering to rework the code, but is it worse to call get_gpu twice at the expense of a few microseconds than it is to saddle the user seeking "boring AI" via a project whose mission objective is to inspect "your system for GPU support, falling back to CPU support if no GPUs are present"? As a user, I would prefer to see command-line arguments removed instead of added. Like, maybe defaulting to a particular model and possibly even eliding the run command. If the Ramalama command-line is complex enough that I have to create novel aliases to run the scaffolding whose primary task (AFAICT) is to replace the need for novel aliases... maybe I'm missing something.

I'll drop a link to this discussion, too: #478 (comment) , because it would not feel great to have this patch break my ability to use the software.

Happy holidays, folks.

BR,
FNG

@bmahabirbu
Copy link
Collaborator Author

@FNGarvin I agree that the end user should not be dealing with unnecessary additional args. The initial implication of that design choice was to build off of an existing framework. Thank you for reeling back the intent for the design. It's a simple fix and has little to do with performance as get_gpu() still can be called once. Env vars are set with the data from get_gpu() for additional calls that need the info.

Additionally, the idea to omit "run" is an interesting one. Something to consider as we move forward. Having args such as run and serve IMO makes people who use podman/docker feel at home with Ramalam. But it would be nice to have a default model.

In that discussion I mentioned how to get podman to work with --gpus all. However, a user should not need to fiddle with additional configurations to get things working. It's a bummer that nvidia-container-toolkit needs to be installed to get things working and have additional configurations to have the same args as docker.

We could go back to having separate commands for docker and podman but I'd like to first test the ability to share the nvidia libraries to the container runtimes manually to consolidate the args and adhere to the idea that we can switch between docker and podman seamlessly.

Happy holidays!

@bmahabirbu bmahabirbu force-pushed the nv-detect branch 2 times, most recently from 234c4af to e60a429 Compare December 2, 2024 09:30
@rhatdan
Copy link
Member

rhatdan commented Dec 17, 2024

@bmahabirbu needs a rebase. @ericcurtin What should we do with this one?

@ericcurtin
Copy link
Collaborator

Not sure, was waiting until this build goes green and conflicts are fixed to look at this again.

At this point it might be easier to rewrite this, copying out the bits we need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants