-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] temporary changes, do not merge #51
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is used by the nix_build.sh script used to build images with terraform. Signed-off-by: Florian Klink <[email protected]>
This introduces a terraform module that can be used to nix-build and upload VM images to Azure. nix-build.sh originates from https://cs.tvl.fyi/depot/-/blob/ops/terraform/deploy-nixos/nixos-eval.sh, which is why it inherits its copyright from there. Signed-off-by: Florian Klink <[email protected]>
This groups some common together some resources to create a VM. We might introduce more flexibility at a later point. Signed-off-by: Florian Klink <[email protected]>
We can just include azure-config.nix from nixpkgs. It pulls in azure- common.nix, which contains all necessary kernel config / udev rules. It also defines a `config.system.azureImage` attribute, which builds a vhd that we can import into azure, using the `azurerm-nix-vm-image` terraform module These can be referred to from source_image_id in Terraform (using azurerm-linux-vm for example), allowing to boot the desired machine config out of the box, without having to do a two-staged-deploy. Signed-off-by: Florian Klink <[email protected]>
This allows injecting custom userdata to the VM at instance creation time, which we can use to provision some config (like SSH pubkey config) that's not part of the NixOS image. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
azure-common.nix already sets services.openssh.settings.{PermitRootLogin,ClientAliveInterval}, so we need to decide what wins. To keep the intended behaviour, we want to mkForce PermitRootLogin to "no" (azure-common.nix sets "prohibit-password"), and set the ClientAliveInterval with mkDefault - bumping that timeout probably makes sense for azure, and we don't want the setting in this file to take priority. Signed-off-by: Florian Klink <[email protected]>
This file contains all ssh public keys used by real humans. It's parsed from Terraform to inject into instance metadata. Signed-off-by: Florian Klink <[email protected]>
This builds the jenkins-master Nix image, turns it into a bootable Azure image, and then boots an instance with the image. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
That way, the VM survives reboots - the non-networkd configuration seems to be quite brittle. Signed-off-by: Florian Klink <[email protected]>
Ideally, we'd keep systemd-resolved disabled too, but the way nixpkgs configures cloud-init prevents it from picking up DNS settings from elsewhere. Signed-off-by: Florian Klink <[email protected]>
Move the azure-specific config snipped into its own file, so we can import it from multiple configuration.nix. azure-common.nix is already used for the existing machine configurations, and as we don't want to break these, it's using this transient name. Signed-off-by: Florian Klink <[email protected]>
This gives each VM a system-assigned identity, and exposes the principal ID as a module output, allowing to grant access to certain resources. Signed-off-by: Florian Klink <[email protected]>
This exposes a read-only HTTP webserver for the contents in the storage container. `rclone serve http` takes care of exposing the storage container over HTTP. We disallow listing (by only allowing access to certain paths), and expose it over HTTP(S) with auto-ssl via caddy. This will work with whatever domain we route to it, so it's not part of the configuration. Signed-off-by: Florian Klink <[email protected]>
This works around NixOS/nixpkgs#272532, we can revert this once NixOS/nixpkgs#272617 has landed here. Signed-off-by: Florian Klink <[email protected]>
We don't want to blindly issue certs for all domains, but make this configurable. This should be config coming from the environment, via cloud-init. Signed-off-by: Florian Klink <[email protected]>
Define this for each machine outside the VM, and describe everything in a single security group. Attaching multiple security groups caused confusing duplicate errors, this might be a Terraform Azure Provider Bug. Signed-off-by: Florian Klink <[email protected]>
This adds filesystem-related tools to the $PATH of cloud-init, so it can format disks with its disk_setup module (and fs_setup) config key. This will be used to format data volumes attached to VMs. Signed-off-by: Florian Klink <[email protected]>
We need to use cloud-init to format and mount data volumes in azure, we can't use systemd for it. Due to hashicorp/terraform-provider-azurerm#6117, disks in Azure gets attached late at boot, so any dev-disk-by-….device units created via systemd-fstab-generator might not exist yet at the time the graph for multi-user.target is created, causing systemd to fail starting downstream services due to a missing dependency. Once the volume is attached, the .device unit pops up via udev, and then a manual restart of services depending on data disks would work, but it's messy. Letting cloud-init take care of data disk mounting (and formatting) is the right choice, that way systemd doesn't need to do any dependency tracking of it. Signed-off-by: Florian Klink <[email protected]>
This adds the ghafbinarycache storage account, and a binary-cache-v1 storage container inside of it. It's used to serve artifacts from (via the binary-cache) VM, and Nix build artifacts are also uploaded to it. Signed-off-by: Florian Klink <[email protected]>
This deploys the VM defined at binary-cache. Attaching the data disks is still a bit messy (requires one reboot, or manual reverse proxy restart). Fixing this requires some more debugging. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
The service-binary-cache module is all the specific hosts need. Signed-off-by: Florian Klink <[email protected]>
Otherwise, cloud-init.service might still be running while we start up services expecting the mount to happen. Signed-off-by: Florian Klink <[email protected]>
Configure the domain and storage account name with cloud-init. This allows keeping the same NixOS image across multiple deployments of this image, serving another bucket at another domain. Also, switch to listening on port 443 only, caddy can use the TLS-ALPN-01 challenge just fine. Signed-off-by: Florian Klink <[email protected]>
This should use tls-alpn-01 on port 443 just fine. Signed-off-by: Florian Klink <[email protected]>
Apparently canonical/cloud-init#4673 and more hacks are not needed, we can simply ramp up the timeout that systemd is willing to wait for the .device unit to appear. Signed-off-by: Florian Klink <[email protected]>
This adds an additional "remote-build" ssh user. The Jenkins controller will use this as user to do remote Nix builds. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
This adds a allocate_public_ip boolean variable (defaulting to false), and will only create a public ip if it's set to true. Signed-off-by: Florian Klink <[email protected]>
This deploys two builders in a new subnet. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
This creates an azure key vault and adds the private key as a secret into there, then grants the jenkins-controller VM access to read that secret. Signed-off-by: Florian Klink <[email protected]>
Use the common group, instead of the current client object id. Signed-off-by: Florian Klink <[email protected]>
This adds a fetch-build-ssh-key systemd service that fetches the ssh private key into /etc/secrets/remote-build-ssh-key (owned by root), and orders itself before nix-daemon. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Render /etc/nix/machines with terraform. In the future, we might want to autodiscover this, or better, have agents register with the controller, rather than having to recreate the VM whenever the list of builders is changed. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
This creates a Nix signing key, and uses terraform-provider-secret to hold it in the terraform state. It's then uploaded into an Azure key vault. The jenkins-controller VM has access to it, and puts it at /etc/secrets/ nix-signing-key. A post-build-hook is configured, uploading every build to the binary cache bucket, with the signature. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
There's no need for any user to ssh into builders, this can be dropped. Signed-off-by: Florian Klink <[email protected]>
The consumes a list of IPs to ssh-keycan once, on startup. In the future, we might want to add support for dynamic discovery, as additional (longer-lived) static hosts. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Prevent the repo and nixpkgs linter from fighting each other about formatting. Signed-off-by: Florian Klink <[email protected]>
This describes the current concepts and components in this PR with more prose. It also describes some of the known issues / compromises.
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
henrirosten
force-pushed
the
copy-of-azure-images
branch
from
January 10, 2024 07:07
b57d074
to
4b94553
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.