Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IFD is crazy slow due to copying the entire tree #347

Open
Ten0 opened this issue May 1, 2024 · 3 comments
Open

IFD is crazy slow due to copying the entire tree #347

Ten0 opened this issue May 1, 2024 · 3 comments

Comments

@Ten0
Copy link

Ten0 commented May 1, 2024

The src directory for IFD is not filtered:

inherit src;

This asks to copy the entire workspace tree to the nix store, performing no filtering.
On large repositories containing build artifacts, this is impractical: it may sometimes represent hundreds of gigabytes, and causes the build to freeze forever while it's trying to copy those hundreds of gigabytes to the store, when all one really needs is all the Cargo.tomls and Cargo.lock.

Fixing

EDIT: see updated solution below instead

It looks like this may be fixed by using cleanSourceWith:

            src = lib.cleanSourceWith {
              inherit src;
              filter = path: type: (
                let baseName = baseNameOf (toString path);
                in
                (
                  (type == "directory" && baseName != "target" && baseName != "node_modules")
                  || (baseName == "Cargo.toml" || baseName == "Cargo.lock")
                ) && (lib.cleanSourceFilter path type) # + other basic filters from nixpkgs
              );
            };

Although this still has the annoying property that the derivation hash will depend on a bunch of unnecessary empty directories (the hash shouldn't depend on any empty directory either), it makes it run reasonably, so it would still be a clear improvement.

Alternatively lib.fileset may be an option.

Also cleaning up empty directories

This post seems to suggest that lib.fileset may by default already be filtering empty directories.

Alternatively

According to this post, adding another layer to further recursively clean unnecessary empty directories:

fullyClean = stdenvNoCC.mkDerivation {
    inherit (mostlyClean) name;
    src = mostlyClean;
    phases = ["unpackPhase" "buildPhase" "installPhase"];
    buildPhase = "find . -type d -empty -exec rmdir {} \+";
    installPhase = "cp -pr --reflink=auto -- . $out";
    # This is such a quick operation that queing a remote builder isn't worth
    # the effort; `preferLocalBuild' will cause this to be run locally.
    preferLocalBuild = true;
  };

may allow preserving the source hash regardless of whether new empty directories are created, but I'm not sure what prevents this from being source addressed so maybe I misunderstand something.
Maybe that + another layer of IFD re-copying the cleaned source would fix it but that seems incredibly dirty.

The post also presents another option that does not have this issue, but which is obviously bad "because nested directories at depth N are redundantly processed N - 1 times.".

It looks like doing the regular directory traversal that filterSource does, but only creating a directory once we're about to put an actual file inside, is something that should already exist in Nix though... I can't be the first looking for this, so it must be in lib.fileset...

@Ten0
Copy link
Author

Ten0 commented May 1, 2024

After some testing, since lib.fileset does remove empty directories, the following works:

            src = lib.fileset.toSource {
              root = src;
              fileset = lib.fileset.fromSource (lib.cleanSourceWith {
                inherit src;
                filter = path: type: (
                  let baseName = baseNameOf (toString path);
                  in
                  (
                    (type == "directory" && baseName != "target" && baseName != "node_modules")
                    || (baseName == "Cargo.toml" || baseName == "Cargo.lock" || baseName == "lib.rs" || baseName == "main.rs")
                  ) && (lib.cleanSourceFilter path type) # + other basic filters
                );
              });
            };

It's still not completely ideal because we actually don't care about the contents of the lib.rs and main.rs files, we only care about them being there or not. (They are used by cargo metadata to determine crates automatically when there are no [[lib]] or [[bin]] targets).
Another point to solve may be how bench/tests targets are inferred, so we may need to copy a bit more of the tree here as well. (Maybe we want to copy any .rs file that's in a tests folder but that on the other hand would be too much...)

@Pandapip1
Copy link

Pandapip1 commented Jun 23, 2024

Potential solution: two additional parameters, hasLibRs and hasMainRs. Then, pass in only the Cargo.lock.

@Ten0
Copy link
Author

Ten0 commented Dec 18, 2024

Potential solution: two additional parameters, hasLibRs and hasMainRs. Then, pass in only the Cargo.lock.

This looks like it would work, but it doesn't seem trivial as it also requires traversing the entire file tree with the same kind of filtering to generate the list of such files to make the derivation generate: it should also work on workspaces, which may have multiple sub-Cargo.tomls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants