Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CADC-12563 support to add directories and symbolic links to packages #96

Merged
merged 3 commits into from
Dec 20, 2023

Conversation

jburke-cadc
Copy link
Contributor

No description provided.

Copy link
Member

@pdowler pdowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see mention of symbolic links in the PR title but superficially nothing in the code. Are they not supported in both tar and zip?

Is polymorphism really the best approach here? It isn't the style of java IO normally, for example with java.io.File and it's isDirectory() and isFile(). With File, regular file vs link requires comparing absolute path and canonical path (subtle). Or this in java.nio:

PosixFileAttributes attrs = Files.readAttributes(p,
                PosixFileAttributes.class, LinkOption.NOFOLLOW_LINKS);
        if (attrs.isDirectory()) {
            ...
        } else if (attrs.isRegularFile()) {
            ...
        } else if (attrs.isSymbolicLink()) {
            ...
        }

I think something simpler like multiple constructors in PackageItem and some boolean methods as above (in nio) would suffice (exactly which ones depending on the answer to the symbolic link question).

general principle: I've come to realise that one should only use polymorphism where it's really needed and where it can remove the need for conditionals and treat different subtypes the same way. Use of instanceof in the same library is a strong hint that this case doesn't really fit the principle (it isn't right vs wrong, it's complexity vs benefit).

@jburke-cadc
Copy link
Contributor Author

Yes, symlinks work for both tar & zip packages. There is a SymbolicLinkPackageItem class, and both the tar and zip writer classes have methods to create the object used to write a symlink to the tar or zip.

The isDirectory(), isSymLink(), etc, methods are also the style of the Apache Commons compress library. And I did what you are suggesting for the first pass, which is what we had talked about. A PackageItem with multiple constructors for each type of file, and then an enum is indicate the type. Didn't have the isFile(), isDir(), etc, methods though. Just checked the enum type when necessary.

Then I thought it better to be more explicit when creating a PackageItem, and did an abstract class with a separate class for each type, instead of checking what constructor is for what type.

I'll update to follow the pattern of other libraries.

@pdowler
Copy link
Member

pdowler commented Dec 1, 2023

let's see:
PackageItem(String relativePath) // directory
PackageItem(String relativePath, URL content) // file
PackageItem(String relativePath, String linkTarget) // symlink

As long as you check and do not allow any null constructor args, it should be unambiguous and you can implement the same methods as in the posix attrs API. You'd have to document that linkTarget needs to be relative.

Would adding one of these make it better in cavern? Is it much better than File("file:///...")?
PackageItem(String relativePath, File content) // file
PackageItem(String relativePath, Path content) // file

You would have to check that content is a regular file (vs dir or symlink), which makes Path slightly more appealing. You'd have to document carefully.

We know we need URL for files in vault so keep that for sure... I'd probably make this a draft PR for now and see how implementation of the iterator looks in cavern. Can we plausibly provide a file: URL at all? Could we plausibly provide access to a Path object instead of a file: URL?

What would be needed to avoid using preauth URL to the files endpoint and sending the data stream out through tomcat and haproxy and back in again and then out as the pkg stream? ... because that's kind of suck.

@jburke-cadc jburke-cadc marked this pull request as draft December 11, 2023 23:41
…tory and symbolic link PackageItem's, added isDirectory(), isFile(), isSymbolicLink() methods.

// copy the file to archive output stream
Files.copy(filePath, archiveOutputStream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure MultiBufferIO.copy is faster than this builtin function because it reads and writes in separate threads. It can make a difference when one of the streams (usually the client stream) is slower than the other.

You can just use fileURL.openStream() in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiBufferIO.copy is already used for HTTP urls, updated to use it for file urls as well.

packageContents.add(dir2);
packageContents.add(file1);
packageContents.add(file2);
packageContents.add(link1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the package content above is the same as in the zip test... maybe have a single method to generate the test content.

also, why does the test write tom a file and to a byte array? I would just write/read the file and leave it behind for manual inspection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the zip and tar writer tests into a single PackageWriterTest class. A single method to create the PackageItem's to be archived, and writing/reading the archive without checking for individual files, really shrunk down the size of the test methods, so put it into a single class.

@jburke-cadc jburke-cadc marked this pull request as ready for review December 20, 2023 15:38
@pdowler pdowler merged commit 222b047 into opencadc:master Dec 20, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants