Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Flat Repository Format #55

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ bazel_dep(name = "bazel_skylib", version = "1.5.0")
bazel_dep(name = "aspect_bazel_lib", version = "2.7.3")
bazel_dep(name = "container_structure_test", version = "1.16.0")
bazel_dep(name = "rules_oci", version = "1.7.4")
bazel_dep(name = "rules_pkg", version = "0.10.1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not add a dependency here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesayyn what about moving it to the bottom of the file and marking it as a dev_dependency? Would that work?


bazel_lib_toolchains = use_extension("@aspect_bazel_lib//lib:extensions.bzl", "toolchains")
use_repo(bazel_lib_toolchains, "bsd_tar_toolchains")
Expand Down
20 changes: 19 additions & 1 deletion apt/private/lockfile.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,29 @@ def _add_package(lock, package, arch):
k = _package_key(package, arch)
if k in lock.fast_package_lookup:
return

filename = package["Filename"]
url_root = package["Root"]

# Takes element following last '/' in string, evaluates if it is a known archive type
url_last_elem = package["Root"].rsplit("/", 1)[-1]
if url_last_elem.endswith(".gz") or url_last_elem.endswith(".xz"):
url_root = url_root.replace(url_last_elem, "")

# Package filename may be parsed as a relative path. Deconstruct the string, see if there is overlap with the URL
if filename.split("/", 1)[0] in url_root:
filename = filename.replace(filename.split("/", 1)[0], "")

url_root = url_root.removesuffix("/")
filename = filename.removeprefix("/")

url = "%s/%s" % (url_root, filename)

lock.packages.append({
"key": k,
"name": package["Package"],
"version": package["Version"],
"url": "%s/%s" % (package["Root"], package["Filename"]),
"url": url,
"sha256": package["SHA256"],
"arch": arch,
"dependencies": [],
Expand Down
38 changes: 27 additions & 11 deletions apt/private/package_index.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,35 @@ def _fetch_package_index(rctx, url, dist, comp, arch, integrity):
r = {"success": False, "integrity": None}

decompression_successful = False

for file_type, tool in file_types.items():
output = "{}/Packages.{}".format(target_triple, file_type)
r = rctx.download(
url = "{}/dists/{}/{}/binary-{}/Packages.{}".format(url, dist, comp, arch, file_type),
output = output,
integrity = integrity,
allow_fail = True,
)
if r.success:
re = rctx.execute(tool + [output])
if re.return_code == 0:
decompression_successful = True
break

# Attempt to pull Packages archive from within the `dists` hierarchy first,
# if that fails - assume the repository is of type "flat"
Comment on lines +15 to +17
Copy link
Contributor

@jjmaestro jjmaestro Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not following the Debian repo format spec that you mentioned. In there, they show the two sources.list formats, one for "canonical" repos:

deb uri distribution [component1] [component2] [...]

and one for flat repos:

deb uri directory/

I think the convention is that directory always ends with the slash / and, while distribution can have slashes, it never ends with a slash.

IMHO we should use the same in the YAML manifest, something like:

version: 1

sources:
  - channel: bullseye main
    url: https://snapshot-cloudflare.debian.org/archive/debian/20240210T223313Z
  - channel: bullseye-cran40/
    url: https://cloud.r-project.org/bin/linux/debian

archs:
  - amd64

packages:
  - bash
  - r-base-core

and use the slash to distinguish between "canonical" and flat repos.

Also, although the sources.list spec doesn't seem to require components, we are also parsing them so I guess that's another difference that should be made in the parsing of the manifest (and validation).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with your findings there.

try_urls = [
"{}/dists/{}/{}/binary-{}/Packages.{}".format(url, dist, comp, arch, file_type),
"{}/Packages.{}".format(url, file_type),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does flat repositories play with different architectures? in the manifest format we support specifying architectures matrix which i don't see being applied here.

I believe we should disallow flat repositories to be combined with non flat repositories to prevent confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think disallowing flat repos mixed with "classic" repos is really needed.

You can have flat repos with multiple architectures (e.g. https://cloud.r-project.org/bin/linux/debian/bullseye-cran40/ has i386 and amd64 packages). The NVIDIA repos are "a bit weirder" since they are flat and have multiple architectures (e.g. https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ has i386 and amd64 packages) but they also seem to have separate flat repos for "other architectures" (e.g. https://developer.download.nvidia.com/compute/cuda/repos/debian12/sbsa/ has arm64 packages).

Personally, as I mentioned in #64 (comment) if there are specific packages in specific architectures I'd just separate them and use another manifest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that makes this even more confusing. How does flat repos interact with arches? for instance if you want to install i386 and amd64 of a package from https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ ?

I don't see how this PR allows discovery of arches in flat repositories as of today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesayyn ah, maybe you are right and there's more fixes needed. I'll try to test the PR and see where it breaks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesayyn OK, I think I got it, the main issue that I found so far (and it's been a very quick look so maybe there's more issues / corner cases) is that in _parse_package_index we force arch to be the one we are using in the for-loop so that's a problem for Package indexes that contain multiple architectures as we will get the package info from a different arch.

This is actually happening in this PR and the proof is in the lock and, easier to see, in the test itself :D Check my comment in examples/debian_snapshot/test_linux_amd64.yaml, the new expectedOutput for the r-base-core package is set as i386 instead of the correct amd64 precisely because the current code in this PR is selecting the wrong package architecture:

(...)
      - r-base-core/now 4.4.1-1~bullseyecran.0 i386 \[installed,local\]

Now, the fix that I would propose here is the following:

diff --git a/apt/private/package_index.bzl b/apt/private/package_index.bzl
index 34f8202..8d6f922 100644
--- a/apt/private/package_index.bzl
+++ b/apt/private/package_index.bzl
@@ -77,7 +77,8 @@ def _parse_package_index(state, contents, arch, root):
 
         if len(pkg.keys()) != 0:
             pkg["Root"] = root
-            util.set_dict(state.packages, value = pkg, keys = (arch, pkg["Package"], pkg["Version"]))
+            arch_ = arch if pkg["Architecture"] == "all" else pkg["Architecture"]
+            util.set_dict(state.packages, value = pkg, keys = (arch_, pkg["Package"], pkg["Version"]))
             last_key = ""
             pkg = {}

IMHO this "is not great" because it's duping the packages that are valid across all architectures N times (as many as architectures we have in the manifest). It's not too bad, it's just some storage space wasted and "some mismatch" between the repo structure that is on disk (the per-arch folders) VS the actual contents of the tar files (this can be seen in the dpk-status in the test, e.g. tzdata is also all but the folder and the "rules_distroless arch" for the package is "amd64" or whatever other arch we are using). But... I don't see how to easily fix this without major changes (and even then I don't know how I'd do it).

]

for try_url in try_urls:
r = rctx.download(
url = try_url,
output = output,
integrity = integrity,
allow_fail = True,
)
if r.success:
re = rctx.execute(tool + [output])
if re.return_code == 0:
decompression_successful = True
break

# If we've set `decompression_successful` to true at any point in the iteration loop,
# break out of the loop as we have found and extracted the Package archive and have what is
# needed to continue.
if decompression_successful:
break

if not r.success:
fail("unable to download package index")
Expand Down
Loading
Loading