Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added yaml2 sync config #1792

Closed

Conversation

mhowell24
Copy link

Adds the yaml2 sync configuration that allows rewriting of the repository path on a repo by repo basis. This largely follows the new yaml format suggested in #1531 and fixes #1072. I have written some testing but need to update the documentation to document the new configuration and its features.

Signed-off-by: Max Howell [email protected]

Signed-off-by: Max Howell <[email protected]>
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Just a very quick skim, focusing basically only on the config format semantics.

@@ -50,9 +50,10 @@ type syncOptions struct {

// repoDescriptor contains information of a single repository used as a sync source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“used as a sync source” is no longer quite true I guess?

ImageRefs []types.ImageReference // List of tagged image found for the repository
Context *types.SystemContext // SystemContext for the sync command
DirBasePath string // base path when source is 'dir'
DestinationRef types.ImageReference // Destination
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types.ImageReference inherently points at a single image. It should not be abused as a repo/directory value.

One way to do this might be to move the destinationReference calls into repoDescriptor construction somehow, and to have repoDescriptor already contain fully-resolved (source reference, destination reference) pairs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to try and render everything down to pairs of source and destination references regardless of how sync was called i.e. using specific transports from the command line or using one of the yaml configs. I think this is possible however it could be a rather large change

@@ -74,6 +75,25 @@ type registrySyncConfig struct {
// sourceConfig contains all registries information read from the source YAML file
type sourceConfig map[string]registrySyncConfig

// syncConfig contains all registries information read from the source YAML file for source yaml2
type syncConfig map[string]registrySyncConfigV2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • So is this syncConfigV2?

  • One of the major lessons from v1 is that having a top-level map[something]something structure makes it impossible to add options at that level. So maybe the top-level should be a struct here as well.

// registrySyncConfigV2 contains all information read from the yaml2 sync config for a registry
type registrySyncConfigV2 struct {
Repos map[string]repoSyncConfig
Credentials types.DockerAuthConfig // Username and password used to authenticate with the registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are credentials really a per-SyncConfig thing?

Conceptually we need both source and destination credentials. How does that work here?

(I guess credentials should be a separate top-level map but I’m not at all sure about that.)

type registrySyncConfigV2 struct {
Repos map[string]repoSyncConfig
Credentials types.DockerAuthConfig // Username and password used to authenticate with the registry
TLSVerify tlsVerifyConfig `yaml:"tls-verify"` // TLS verification mode (enabled by default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be separate for source and destination. (And, like credentials, possibly separate from the repo configurations — it makes no sense for two repo destinations on the same registry to use a different value.)

// It returns a repository descriptors slice with as many elements as the images
// found and any error encountered. Each element of the slice is a list of
// image references, to be used as sync source.
func imagesToCopyFromRegistryV2(registryName string, cfg registrySyncConfigV2, sourceCtx types.SystemContext, destination string) ([]repoDescriptor, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be practical at all, instead of duplicating this large logic, to instead convert v1 data into the v2 format, and have only one execution path?

logrus.Errorf("destination ref type is unsupported for this sync")
return fmt.Errorf("destination ref type is unsupported for this sync")
}
} else {
Copy link
Contributor

@mtrmac mtrmac Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really valuable at all to allow specifying a destination on the command line with YAML2? I imagine it’s always going to be the same for the same YAML file, though I’m not quite sure.

(OTOH if we don’t support this code path, we need to account for directory destinations in the YAML format.)

@mhowell24
Copy link
Author

Thanks for the quick response @mtrmac. You make a good point about the config being a map removing some flexibility, I agree a struct would be better. Considering your comments from above, how about something along the lines of this for new config format:

syncs:
- source:
    registry: example.docker.io
    tls-verify: true
    cert-dir: /example/certs
    credentials:
      username: joeBlogs123
      password: password123
  destination:
    registry: my.other.registry
    tls-verify: false
    cert-dir: /example/other/certs
    credentials:
      username: joeBlogs456
      password: password456
  repos:
  - source-path: example/image
    destination-path: example/test/image
    images:
      - v1.0
      - v1.1
      - v2.0
      - sha256:0000000000000000000000000000000011111111111111111111111111111111
    images-by-tag-regex: ^v[0-9]\.0$
  - source-path: example/image2
    destination-path: image2
    images:
      - "1.0"
      - "1.1"
      - "2.0"
      - sha256:0000000000000000000000000000000011111111111111111111111111111111
    images-by-tag-regex: ^[0-9]\.1$

Each element in syncs allows us to define everything we need for syncing multiple repos from a single source registry to repos in a single destination registry, it also give the user control over the destination path on a repo per repo basis.

We could go further and seperate the registries out completly and then use them as either sources or destinations e.g.

registries:
- name: reg0
  domain: example.docker.io
  tls-verify: true
  cert-dir: /example/certs
  credentials:
    username: joeBlogs123
    password: password123
- name: reg1
  domain: my.other.registry
  tls-verify: false
  cert-dir: /example/certs
  credentials:
    username: joeBlogs456
    password: password456
- name: reg2
  domain: my.backup.registry
  tls-verify: false
  cert-dir: /example/certs
  credentials:
    username: joeBlogs789
    password: password789
repos:
- source: reg0
  destination: reg1
  source-path: example/image
  destination-path: example/test/image
  images:
  - v1.0
  - v1.1
  - v2.0
  - sha256:0000000000000000000000000000000011111111111111111111111111111111
  images-by-tag-regex: ^v[0-9]\.0$
- source: reg1
  destination: reg3
  source-path: example/image2
  destination-path: image2
  images:
  - "1.0"
  - "1.1"
  - "2.0"
  - sha256:0000000000000000000000000000000011111111111111111111111111111111
  images-by-tag-regex: ^[0-9]\.1$

@mtrmac
Copy link
Contributor

mtrmac commented Oct 25, 2022

The second version (separate registries configuration from repo sets) seems preferable to me, because I’m guessing that many users have a single private / disconnected mirror that contains copies from several internet repositories; in that case it’s useful to only have to specify the single-mirror credentials once.

But that’s a guess without any data. What do actual users of skopeo sync think?

@mtrmac
Copy link
Contributor

mtrmac commented Oct 25, 2022

  • Also, looking at the above, I’m not sure the registries need an extra “name”; I think just identifying them using the host[:port] value would work well enough. Most importantly that would allow the simple case, where the registry needs no configuration (because credentials are available in auth.json already) to be set up without writing a registries entry.
  • I’m not sure how the source/destination/source-path/destination-path fields work for dir: locations. One possibility would be to have source-registry-repo/source-dir/destination-registry-repo/destination-dir, with each field having a single clearly-defined syntax — but I haven’t really taken any time to consider the alternatives or trade-offs.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@mtrmac mtrmac added the kind/feature A request for, or a PR adding, new functionality label Dec 7, 2022
@github-actions github-actions bot removed the stale-pr label Dec 8, 2022
@github-actions
Copy link

github-actions bot commented Jan 8, 2023

A friendly reminder that this PR had no activity for 30 days.

@art-shutter
Copy link

I believe the option to abstract registries away could actually complicate things. If all I need is a simple case "paths on the source and destination should be the same", supplying paths and repos for every set of images becomes a task that needs a templating engine on top to actually generate the config.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Feb 23, 2023

@mhowell24 Still working on this?

@github-actions github-actions bot removed the stale-pr label Feb 24, 2023
@mhowell24
Copy link
Author

@rhatdan sorry, I haven't had any time to work on this recently. I might have some some time next weekend but cant promise anything

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature A request for, or a PR adding, new functionality locked - please file new issue/PR stale-pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sync: Add option to keep full name on target when using YAML
4 participants