Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Inventory & Sha7 appended filenames for binaries #51

Merged
merged 18 commits into from
Sep 18, 2024

Conversation

schneems
Copy link
Contributor

@schneems schneems commented Jul 23, 2024

The Ruby buildpack is tightly coupled to the URLs generated by this repo. This change introduces an inventory/manifest file that can be used as a layer of indirection. This is a pattern used by several other buildpacks:

The file acts as a lookup, so I can say "find the entry with version 3.3.3 from ruby_inventory.toml for heroku-24 with arm64 CPU architecture and it will return an entry containing a URL and a checksum. The URL says where to find that artifact and the checksum validates that the artifact hasn't been modified between when it was added to the inventory and now.

It also allows us to decouple modifying the S3 bucket with rolling out the version on Heroku via the buildpack. In effect this allows us to make an automated release pipeline (in the future) where source changes can be detected automatically on the ruby-lang source site and built and uploaded to S3. This will trigger a manifest entry to be added to the files which will (eventually) trigger a PR to the Ruby buildpack. A single engineer can review that, approve and merge it, and then deploy the change. Versus today, the process requires two engineers for compliance. (non-automated PRs require approval of a second engineer that didn't write the change).

Rebuild support

As we roll this out we have to preserve the current binary filenames or it will break existing buildpacks. However we also need to introduce a new filename to allow for multiple binary builds of the same version. Details:

Currently the same version of Ruby can be re-built which will over-write the existing version. This should produce the same output, but we've not done any work on byte-for-byte reproducible builds and even a single bit changed (due to a timestamp, for example) would cause a different SHA. That means that we cannot ship a manifest file that points directly at the current URLs i.e. https://heroku-buildpack-ruby.s3.us-east-1.amazonaws.com/heroku-24/arm64/ruby-3.3.4.tgz otherwise if the build is re-run the SHA will change and the inventory will no longer be valid.

There is a need to be able to generate a binary multiple times in the event of a problem. A recent example is #49 where a jruby tar file was generated with truncated filenames.

This is solved by appending the first 7 characters of the sha to the file so now when 3.3.4 is built it will generate both:

  • heroku-24/arm64/ruby-3.3.4.tgz
  • heroku-24/arm64/ruby-3.3.4-<sha7>.tgz like heroku-24/arm64/ruby-3.3.4-7bebeee.tgz

The second tar file's checksum should never change. This will allow us to roll out a manifest file with checksum validation enabled and also allow us to re-generate binaries if needed.

schneems added 11 commits July 23, 2024 17:23
Currently the Ruby buildapack is tightly coupled to the URLs that are generated by this repo. I am in the process of introducing an inventory file that the buildpack can use as a lookup. To start that process this commit introduces generating tgz files with the first 7 SHA256 characters appended. This allows us to update the same version number in the future without worrying that it will break future digest checks of the URL.

With this change the builder will continue to generate the original URLs, but it will also generate "<filename>-<sha256>.tgz" files as well and append this information to a `jruby_inventory.toml` file.
- We now create only one sha named archive for jruby builds since both ARM and AMD point to the same source.
- Binary `inventory_check` takes in a  file and validates that all checksums are valid.
@schneems schneems changed the title Generate Inventory & Sha7 appended filenames for JRuby Generate Inventory & Sha7 appended filenames for binaries Jul 25, 2024
@schneems schneems force-pushed the schneems/sha-7 branch 4 times, most recently from 2cb828c to 19bfff6 Compare July 26, 2024 20:23
Before:

```
drwxr-xr-x 2 root   root       4096 Jul 25 22:34 .
drwxr-xr-x 3 runner docker     4096 Jul 25 22:32 ..
-rw-r--r-- 1 root   root   24061018 Jul 25 22:34 ruby-3.2.3.tgz
```

With these permissions we were unable to write to the directory. After:

After

```
drwxrwxrwx 2 root   root       4096 Jul 27 02:53 .
drwxr-xr-x 3 runner docker     4096 Jul 27 02:51 ..
-rw-r--r-- 1 root   root   24061325 Jul 27 02:53 ruby-3.2.3.tgz
```

Now we can copy the file to a new filename in the same directory.
@schneems schneems marked this pull request as ready for review August 27, 2024 20:06
@schneems schneems requested a review from a team as a code owner August 27, 2024 20:07
runesoerensen and others added 2 commits September 5, 2024 12:09
```
error: the borrowed expression implements the required traits
   --> ruby_executable/src/bin/ruby_build.rs:143:24
    |
143 |           docker_run.arg(&format!(
    |  ________________________^
144 | |             "./make_ruby.sh {} {}",
145 | |             input_tar.display(),
146 | |             output_tar.display()
147 | |         ));
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrows_for_generic_args
    = note: `-D clippy::needless-borrows-for-generic-args` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::needless_borrows_for_generic_args)]`
help: change this to
    |
143 ~         docker_run.arg(format!(
144 +             "./make_ruby.sh {} {}",
145 +             input_tar.display(),
146 +             output_tar.display()
147 ~         ));
    |

error: could not compile `ruby_executable` (bin "ruby_build" test) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
```
@schneems schneems enabled auto-merge (squash) September 5, 2024 19:48
Copy link
Contributor

@runesoerensen runesoerensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d probably have preferred less duplication between ruby_build.rs and jruby_build.rs. Seems to me those could actually be merged almost entirely, and at least the newly introduced logic is identical.

Probably more of a matter of style (and not super important in this context/for what this repository does), just seems like there’s higher than necessary risk that the two implementations will fall out of sync over time.

I’d also have preferred to see all the code being included in this PR (and not in the GemVersion crate), but I understand the intention is to move that in the future/this is acceptable for now, so going to approve :)

@schneems schneems merged commit c5f2030 into main Sep 18, 2024
26 checks passed
@schneems schneems deleted the schneems/sha-7 branch September 18, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants