Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] bundle: Parallel download and decompression #4504

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vyasgun
Copy link
Contributor

@vyasgun vyasgun commented Dec 6, 2024

Description

This pull request does the following:

  • Return a reader from the bundle Download function.
  • Use the reader to stream the bytes to Extract function.

This commit replaces grab client with the net/http client to ensure that the bytes are streamed come in correct order to the Extract func. Currently, only zst decompression is being used in the UncompressWithReader function as it is the primary compression algorithm being used in crc.

The download progress bar has been removed temporarily and will be added back as part of refactoring the code.

Fixes: #4336

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Proposed changes

  • Return a reader from the bundle Download function.
  • Use the reader to stream the bytes to Extract function.

Testing

Contribution Checklist

  • I have read the contributing guidelines
  • My code follows the style guidelines of this project
  • I Keep It Small and Simple: The smaller the PR is, the easier it is to review and have it merged
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I tested my code on specified platforms
    • Linux
    • Windows
    • MacOS

Copy link

openshift-ci bot commented Dec 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Dec 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign adrianriobo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vyasgun
Copy link
Contributor Author

vyasgun commented Dec 6, 2024

/test all

Copy link

openshift-ci bot commented Dec 6, 2024

@vyasgun: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-crc 6cf717b link true /test e2e-crc
ci/prow/images 6cf717b link true /test images
ci/prow/security 6cf717b link false /test security
ci/prow/integration-crc 6cf717b link true /test integration-crc
ci/prow/e2e-microshift-crc 6cf717b link true /test e2e-microshift-crc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vyasgun vyasgun force-pushed the pr/parallel-decompress branch 8 times, most recently from 7766bc1 to bb0b17c Compare December 7, 2024 17:32
This commit does the following:
- Return a reader from the bundle Download function.
- Use the reader to stream the bytes to Extract function.

This commit replaces grab client with the net/http client to ensure
that the bytes are streamed come in correct order to the Extract func.
Currently, only zst decompression is being used in the
UncompressWithReader function as it is the primary compression algorithm
being used in crc.
@vyasgun vyasgun force-pushed the pr/parallel-decompress branch from bb0b17c to 3a62d1a Compare December 7, 2024 17:34
Copy link
Contributor

@redbeam redbeam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is great work and it's functional.

My findings:

  • cancelling through web api (socket) works

  • better logging would be nice (currently it's skipping the download part) so that it's clear that the download and uncompression is being done simultaneously

  • progress bar could show more info about both processes

  • resuming interrupted download doesn't work - everything starts from the beginning

  • golangci-lint issues

Suggestions:

  • add (cli/config) option to disable this functionality (revert back to old behavior)

}
client := http.Client{Transport: &http.Transport{}}

if ctx == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check might need to be moved higher, as I see that http.NewRequestWithContext(ctx, "GET", uri, nil) is already called earlier (and if ctx is null, might produce errors).

}
return downloadInfo.Download(ctx, constants.GetDefaultBundlePath(preset), 0664)
}

func Download(ctx context.Context, preset crcPreset.Preset, bundleURI string, enableBundleQuayFallback bool) (string, error) {
func Download(ctx context.Context, preset crcPreset.Preset, bundleURI string, enableBundleQuayFallback bool) (io.Reader, string, error) {
var reader io.Reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is kind of "hidden" here above the big comment, I suggest moving it down below it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable does not seem to be required in this block, the last line of the function can return nil instead of reader

@@ -116,43 +116,43 @@ func GetPresetName(imageName string) crcpreset.Preset {
return preset
}

func PullBundle(ctx context.Context, imageURI string) (string, error) {
func PullBundle(ctx context.Context, imageURI string) (io.Reader, string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add support for pulling from container image repositories as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your question, PullBundle pulls from a container image registry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PullBundle never returns a bundle, I would not change its signature, and do something like this in Download instead of return image.PullBundle(ctx, bundleURI)

path, err := image.PullBundle(ctx, bundleURI)
return nil, path, err

But maybe you have plans to make it return a reader later?

@@ -198,8 +229,14 @@ func Use(bundleName string) (*CrcBundleInfo, error) {
return defaultRepo.Use(bundleName)
}

func Extract(ctx context.Context, path string) (*CrcBundleInfo, error) {
if err := defaultRepo.Extract(ctx, path); err != nil {
func Extract(ctx context.Context, reader io.Reader, path string) (*CrcBundleInfo, error) {
Copy link
Contributor

@redbeam redbeam Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use just one return statement and one error check in this case:

var err error
if reader == nil {
	err = defaultRepo.Extract(ctx, path)
} else {
	err = defaultRepo.ExtractWithReader(ctx, reader, path)
}

if err != nil {
	return nil, err
}
return defaultRepo.Get(filepath.Base(path))

Add more blank lines as you see fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to make the same suggestion

@@ -124,6 +125,36 @@ func (bundle *CrcBundleInfo) createSymlinkOrCopyPodmanRemote(binDir string) erro
return bundle.copyExecutableFromBundle(binDir, PodmanExecutable, constants.PodmanRemoteExecutableName)
}

func (repo *Repository) ExtractWithReader(ctx context.Context, reader io.Reader, path string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function and Extract are very similar, could they be merged in some way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract could probably use os.Open and call into ExtractWithReader

return nil, err
}
return untar(ctx, reader, targetDir, fileFilter, showProgress)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: empty line 47

@@ -86,6 +101,9 @@ func uncompress(ctx context.Context, tarball, targetDir string, fileFilter func(
}
}

func Untar(ctx context.Context, reader io.Reader, targetDir string, fileFilter func(string) bool, showProgress bool) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is for some future functionality?

@@ -124,6 +125,36 @@ func (bundle *CrcBundleInfo) createSymlinkOrCopyPodmanRemote(binDir string) erro
return bundle.copyExecutableFromBundle(binDir, PodmanExecutable, constants.PodmanRemoteExecutableName)
}

func (repo *Repository) ExtractWithReader(ctx context.Context, reader io.Reader, path string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract could probably use os.Open and call into ExtractWithReader

@@ -163,7 +163,7 @@ func downloadDataFiles(goos string, components []string, destDir string) ([]stri
if !shouldDownload(components, componentName) {
continue
}
filename, err := download.Download(context.TODO(), dl.url, destDir, dl.permissions, nil)
_, filename, err := download.Download(context.TODO(), dl.url, destDir, dl.permissions, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd add a download.DownloadFile(…) (string, error) to make it clear when we don't need the reader.

@@ -116,43 +116,43 @@ func GetPresetName(imageName string) crcpreset.Preset {
return preset
}

func PullBundle(ctx context.Context, imageURI string) (string, error) {
func PullBundle(ctx context.Context, imageURI string) (io.Reader, string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PullBundle never returns a bundle, I would not change its signature, and do something like this in Download instead of return image.PullBundle(ctx, bundleURI)

path, err := image.PullBundle(ctx, bundleURI)
return nil, path, err

But maybe you have plans to make it return a reader later?

}
return downloadInfo.Download(ctx, constants.GetDefaultBundlePath(preset), 0664)
}

func Download(ctx context.Context, preset crcPreset.Preset, bundleURI string, enableBundleQuayFallback bool) (string, error) {
func Download(ctx context.Context, preset crcPreset.Preset, bundleURI string, enableBundleQuayFallback bool) (io.Reader, string, error) {
var reader io.Reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable does not seem to be required in this block, the last line of the function can return nil instead of reader

@@ -198,8 +229,14 @@ func Use(bundleName string) (*CrcBundleInfo, error) {
return defaultRepo.Use(bundleName)
}

func Extract(ctx context.Context, path string) (*CrcBundleInfo, error) {
if err := defaultRepo.Extract(ctx, path); err != nil {
func Extract(ctx context.Context, reader io.Reader, path string) (*CrcBundleInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to make the same suggestion

logging.Infof("Extracting bundle: %s...", bundleName)
if _, err := bundle.Extract(ctx, bundlePath); err != nil {
if _, err := bundle.Extract(ctx, reader, bundlePath); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a bundlePath and a reader feels a bit redundant, ideally we could pass one or the other, but I'm not sure it is currently that easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel bundle download & decompression
3 participants