Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distributedCache: allow local dir:// and container-storage:// as a cache transport #4879

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flouthoc
Copy link
Collaborator

Allowing local directory as transport enables users to make distributed caching more flexible by making sure.

  • Allows end users to move cache sources and destination manually.
  • Allows storing cache on an NFS or various mount sources.
  • Allows end users to distribute cache without registry setup.

[NO NEW TESTS NEEDED]
[NO TESTS NEEDED]
// Will add test in final commit

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake
/kind other

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

distributedCache: allow local `dir://` and `container-storage://` as a cache transport

Closes: #4875

@flouthoc flouthoc marked this pull request as draft June 26, 2023 08:14
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Allowing local directory as transport enables users to make distributed
caching more flexible by making sure.

* Allows end users to move cache sources and destination manually.
* Allows storing cache on an `NFS` or various mount sources.
* Allows end users to distribute cache without registry setup.

[NO NEW TESTS NEEDED]
[NO TESTS NEEDED]
// Will add test in final commit

Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the local-distributed-cache branch from 7d087e9 to 7c3fdcb Compare June 26, 2023 10:08
@rhatdan
Copy link
Member

rhatdan commented Jun 26, 2023

@nalind PTAL
My concern here is you are breaking API.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 28, 2023

Is this a breaking change?
@vrothberg @mtrmac PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We probably need to add new fields for that purpose. Tests would be great as well.

@@ -141,10 +140,10 @@ type BuildOptions struct {
TransientMounts []string
// CacheFrom specifies any remote repository which can be treated as
// potential cache source.
CacheFrom []reference.Named
CacheFrom []types.ImageReference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking the API yes.

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This doesn’t implement the referenced containers-storage at all. I’m not convinced it should, but in place of any code there’s just … nothing.
  • c/image/types.ImageReference represents a single image. I don’t know that there is any reasonable way to represent a transport-independent “image namespace” where the caching logic could create entries for individual cacheKey values.

Given the above, I think the most direct approach is to add a Cache{From,To}Dir string fields in the internal API.

As for the CLI, I’m also a bit unhappy with using dir:/parent/dir as a syntax, and I’d prefer a --cache-from-dir here as well.

  • This just doesn’t generalize. I don’t know what --cache-to=docker-archive:… or --cache-from=sif:… would mean or do.
  • I’m always skeptical of options that accept ambiguous syntaxes. Right now with this PR, dir:foo becomes ambiguous between dir:./foo and docker://docker.io/library/dir:foo (because tags in inputs are no longer prohibited).

--cache-from-dir foo is not more typing than --cache-from dir:foo, so I don’t see much of a downside to a separate option.


I didn’t revisit the cache functionality to recall what it does exactly, and in particular see what compression means / doesn’t mean. (Notably pushes to dir: don‘t compress input by c/image default.)

Intuitively I’m very skeptical about using containers-storage, that is a very expensive way to store layer tarballs.

Comment on lines +993 to +997
if remote.Transport().Name() == imagedocker.Transport.Name() {
repo = remote.DockerReference().String()
} else {
repo = remote.StringWithinTransport()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really value in special-casing docker:// here, enough to be worth the ambiguity?

if remote.Transport().Name() == imagedocker.Transport.Name() {
repo = remote.DockerReference().String()
} else {
repo = remote.StringWithinTransport()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is supposed to represent arbitrary transports, please use transports.ImageName(), not a single-transport-scoped name.

@@ -53,16 +53,17 @@ const (
)

// RepoNamesToNamedReferences parse the raw string to Named reference
func RepoNamesToNamedReferences(destList []string) ([]reference.Named, error) {
var result []reference.Named
func RepoNamesToNamedReferences(destList []string) ([]types.ImageReference, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name and comment are now incorrect.

named, err := reference.ParseNormalizedNamed(dest)
if t := alltransports.TransportFromImageName(dest); t == nil {
// assume no default transport provided in such case append docker
dest = "docker://" + dest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this remains: There are already over half a dozen copies of a similar fallback code, in many different variants. That should be centralized.

(For a start, there is util.DefaultTransport.)

} else {
repo = remote.StringWithinTransport()
}
fmt.Fprintf(s.executor.out, "%s %s\n", cachePullMessage, fmt.Sprintf("%s:%s", repo, cacheKey))
}
}
// logCachePush produces build log for cases when `--cache-to`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn’t this also need updating?

if err != nil {
return nil, fmt.Errorf("failed generating directory reference for %q: %w", tagged, err)
}
result = append(result, dest)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else… this silently drops unsupported input without telling the user?!

Comment on lines +1806 to +1813
} else if repo.Transport().Name() == imagedirectory.Transport.Name() {
dirPath := repo.StringWithinTransport()
tagged := filepath.Join(dirPath, cachekey)
dest, err := imagedirectory.NewReference(tagged)
if err != nil {
return nil, fmt.Errorf("failed generating directory reference for %q: %w", tagged, err)
}
result = append(result, dest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m fairly unhappy with using the transport:image-name syntax, and especially the types.ImageReference values, to represent things that are not a single image. This would not be the first instance where that happens but I’m unhappy every time :)

Admittedly c/image has no “repo/collection of images” abstraction… and that’s because, just like this code shows, there’s just too little the transports have in common to build a meaningful abstraction like that.

srcString = src.DockerReference().String()
} else if src.Transport().Name() == imagedirectory.Transport.Name() {
srcString = "dir://" + src.StringWithinTransport()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else… what??

if src.Transport().Name() == imagedocker.Transport.Name() {
srcString = src.DockerReference().String()
} else if src.Transport().Name() == imagedirectory.Transport.Name() {
srcString = "dir://" + src.StringWithinTransport()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// is not a part of the dir: syntax (and, theory corner, in POSIX, paths starting with "//" have, hypothetically special meaning, some UNIX systems IIRC accepted a "//nfs.host.name/path" at the kernel level).

This should just unconditionally use transports.ImageName; the called function uses alltransports.ParseImageName (before starting with heuristics), so this should use transports.ImageName to create a clearly unambiguous input.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 31, 2024

Closing as this has not been touched in 6 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support container-transports in --cache-to and --cache-from
4 participants