Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocky8 Linux AWS Compute Nodes #124

Merged

Conversation

GMW99
Copy link

@GMW99 GMW99 commented May 26, 2022

This pull request makes Rocky8 Linux the Source AMI for the compute nodes on AWS.

Associated issue: #121

Changes include:

  • Changing the AMI to the community support Rocky8 as specified here: https://forums.rockylinux.org/t/rocky-linux-official-aws-ami/3049/25
  • Changing the ssh name for compute nodes
  • Changing the AWS instance type to micro such that packer can run
  • Adding kernel-devel and headers as the default install so the Nvidia driver can be installed properly following current documentation.

After adding:

if [[ $(arch) == "x86_64" ]]; then
  sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
  sudo dnf module install -y nvidia-driver:latest-dkms
fi

as per the documentation and running sudo /usr/local/bin/run-packer

This is the resulting compute node output from

#! /bin/bash

hostname
cat /etc/os-release
nvidia-smi -q | head

Output:

NAME="Rocky Linux"
VERSION="8.6 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

==============NVSMI LOG==============

Timestamp                                 : Thu May 26 11:42:20 2022
Driver Version                            : 515.43.04
CUDA Version                              : 11.7

Attached GPUs                             : 1
GPU 00000000:00:1E.0
    Product Name                          : Tesla T4```

= added 4 commits May 25, 2022 16:28
Centos8 is now end of life and is not supported.
Rocky8 however can fit as a dropin replacement and is supported.
The AMI is used from the Rocky community AMI ownership ID: 792107900819 as stated: https://forums.rockylinux.org/t/rocky-linux-official-aws-ami/3049/24
Rocky8 default user is rocky instead of centos.
The packer fails due to a memory error this is removed when the instance type is micro.
Rocky8 does not come with the kernel-devel and headers as default, without these the nvidia drivers fail to install properly. Therefore, they are installed here to maintain documentation.
@milliams milliams merged commit 876f1e2 into clusterinthecloud:6 Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants