Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arm remote builder with ubuntu image #81

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

joinemm
Copy link
Collaborator

@joinemm joinemm commented Feb 23, 2024

  • aarch64 VM added, the module is modified version of the azurerm-linux-vm module.
  • arm machines are added as remote builders on jenkins-controller.
  • Currently has lot of duplicated code, modules could be refactored in the future.

@joinemm joinemm requested review from henrirosten and a team February 23, 2024 08:53
@joinemm joinemm force-pushed the pr-arm-builder branch 3 times, most recently from 32957e5 to 8678329 Compare February 23, 2024 10:40
Copy link
Collaborator

@henrirosten henrirosten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting the following error on trying to build an aarch64 target on jenkins-controller after taking the changes from this PR:

cannot build on 'ssh://[email protected]': error: failed to start SSH connection to '[email protected]': Host key verification failed.

known_hosts on jenkins-controller is populated here:

${pkgs.openssh}/bin/ssh-keyscan -f /var/lib/builder-keyscan/scanlist -v -t ed25519 > /root/.ssh/known_hosts

The builders get added to the scanlist here:

content = join("\n", toset(module.builder_vm[*].virtual_machine_private_ip_address))
"path" = "/var/lib/builder-keyscan/scanlist"

I believe you should add the arm_builder_vms to that list too.

Copy link
Collaborator

@henrirosten henrirosten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joinemm: I know you did did pretty extensive trials with attempting to spin up NixOS VMs on Azure.

Please reference the results of your trials, and point out what were the main reasons you had to go with Ubuntu+Nix instead of NixOS on aarch64 builders.

I assume we'll later resume to this topic, and having the results from your analysis will then be valuable.

terraform/modules/arm-builder-vm/ubuntu-builder.sh Outdated Show resolved Hide resolved
@joinemm joinemm force-pushed the pr-arm-builder branch 5 times, most recently from 243e40a to d9aa5b5 Compare March 8, 2024 16:00
@joinemm
Copy link
Collaborator Author

joinemm commented Mar 8, 2024

@joinemm: I know you did did pretty extensive trials with attempting to spin up NixOS VMs on Azure.

Please reference the results of your trials, and point out what were the main reasons you had to go with Ubuntu+Nix instead of NixOS on aarch64 builders.

I assume we'll later resume to this topic, and having the results from your analysis will then be valuable.

Reasons explained now in arm-builder-vm readme.

I've also updated the ubuntu-builder.sh script to install the same rclone service as the nixos configuration does, so it uses the azure binary cache.

@joinemm joinemm requested review from a team, karim20230 and henrirosten March 8, 2024 16:15
Copy link
Collaborator

@henrirosten henrirosten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this change looks good to me. I tested that the deployment works, and the resulting infra can build aarch64 targets. Also, the build results end-up in the deployed binary cache signed with the correct key.

However, there is one problem I noticed in testing: after taking the changes from this PR, the deployment to UAE no longer works. The reason is: D2ps - D64ps v5 Azure image sizes used in aarch64 builders are not available in UAE, so the attempt to deploy to that location fails with:

Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="The requested VM size Standard_D4ps_v5 is not available in the current region. The sizes available in the current region are: ...

Bigger problem is that currently, there are no Azure image sizes that support arm-based VMs on UAE.

My proposal is:

  • We merge this change now despite the above issue
  • We later follow-up with a work-around to address the above issue: perhaps use the aarch64 builder from another location on UAE deployments etc.

@joinemm joinemm merged commit 35189cb into tiiuae:main Mar 12, 2024
2 checks passed
@joinemm joinemm deleted the pr-arm-builder branch March 12, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants