Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path coercion in Flakes causes files to be added to the store twice #9428

Open
infinisil opened this issue Nov 21, 2023 · 14 comments
Open

Path coercion in Flakes causes files to be added to the store twice #9428

infinisil opened this issue Nov 21, 2023 · 14 comments
Labels

Comments

@infinisil
Copy link
Member

infinisil commented Nov 21, 2023

Describe the bug

When coercing paths to strings (which adds said paths into the Nix store) in Flakes, all files are added to the store twice, leading to doubled space usage.

I understand the underlying mechanism and why this happens, but this sounds like a bug to me.

Steps To Reproduce

Create this flake.nix:

{
  outputs = { ... }: {
    result = "${./.}";
  };
}

Then run:

$ nix eval .#result
"/nix/store/jmnphp5bc4m32y80jw6ms16s8gb5g1gy-x2xsp09mx900wf1h7pglh9xlwy3r1349-source"
$ cat /nix/store/jmnphp5bc4m32y80jw6ms16s8gb5g1gy-x2xsp09mx900wf1h7pglh9xlwy3r1349-source/flake.nix
{
  outputs = { ... }: {
    result = "${./.}";
  };
}
$ cat /nix/store/x2xsp09mx900wf1h7pglh9xlwy3r1349-source/flake.nix
{
  outputs = { ... }: {
    result = "${./.}";
  };
}

Notice how the flake.nix file was copied to two store paths.

Expected behavior

Only a single store path is used.

nix-env --version output

nix-env (Nix) 2.18.1

Additional context

This issue is sponsored by Antithesis

Priorities

Add 👍 to issues you find important.

@thufschmitt
Copy link
Member

I suspect that it would be solved by #6530 when/if it lands (though that doesn't prevent it from being a bug).

A possible workaround is to use self rather than path interpolation

@tomberek
Copy link
Contributor

had some random ideas... this happens often and is a common confusion due to the simple syntax and non-intuitive outcome. Note: most of these will change evaluation values for historical expressions, not backwards compatible.

  • nix-instantiate --strict --eval "/nix/store/HASH-t/default.nix" (with a similar "./." construct in it) will have the same doubling take place - does that behavior need to be preserved? or is this a bug for both flakes/non-flakes?

  • what should be the value of the nixpkgs "path" attribute in Nix? when forced to a string? https://github.com/NixOS/nixpkgs/blob/master/pkgs/top-level/all-packages.nix#L83-L84 .

  • Would we still want something like ${./some-dir/data-file} to be re-copied into the store into its own storepath even when ./some-dir or ./. has already been added? That is duplication as well. So we need to somehow distinguish these cases.

  • possible to declare that all nix store contents do not get re-added, but become paths referring to existing content. eg ./t-2/hi becomes /nix/store/HASH-t2/hi, but this has the drawback of potentially storing much more content than necessary, eg: ./data/only-need-this-dir

  • We can detect re-adding an existing store path and short-circuit the copyPath. (TODO: add some checks to ensure they really are the same contents modulo the directory name).

diff --git a/src/libexpr/eval.cc b/src/libexpr/eval.cc
index bf6b6f8c1..f7c37ae21 100644
--- a/src/libexpr/eval.cc
+++ b/src/libexpr/eval.cc
@@ -2345,6 +2345,9 @@ StorePath EvalState::copyPathToStore(NixStringContext & context, const SourcePat
     auto dstPath = i != srcToStore.end()
         ? i->second
         : [&]() {
+            if (context.empty() && store->isValidPath(path.baseName())) {
+                return store->parseStorePath(path.path.abs());
+            }
             auto dstPath = path.fetchToStore(store, path.baseName(), FileIngestionMethod::Recursive, nullptr, repair);
             allowPath(dstPath);
             srcToStore.insert_or_assign(path, dstPath);
  • This is not really flake related, but due to our path primitives taking into account the source directory's name and that the Nix store model allows for the same content to appear under different names. What is the name of the source directory in the example provided? Flakes give their source inputs the "-source" name by default to alleviate this problem, so at least that footgun has been improved.

  • Similar bugs and confusions have been caused by this throughout Nixpkgs for which idioms and policy have been applied. The evaluation result of any path in non-flakes will have a non-deterministic output:
    nix-instantiate --eval --expr "./." has my home directory and username in it! Do we promise reproducibility for only string values, or path values as well?

  • this is essentially the same test, two people running the same thing but in different directories get different results.

mkdir t1 t2
echo hi > t1/hi
echo hi > t2/hi
( cd t1 && nix-instantiate --eval --expr '"${./.}"' ;)
( cd t2 && nix-instantiate --eval --expr '"${./.}"' ;)

Context that others may be interested in:

@edolstra
Copy link
Member

While undesirable, this is not a bug, not something we can fix in a backward compatible way, and not flake-specific.

@infinisil
Copy link
Member Author

infinisil commented Dec 2, 2023

Importing a local directory (that's not already in the store) twice when using ./. only happens with Flakes, so it is Flakes-specific. This cannot be reproduced with neither just the new experimental CLI nor the stable CLI..

And the point of merging features as experimental is that it allows experimentation without having to worry about backwards compatibility. Sure there's a lot of people using this now, but we can still deprecate current experimental features within years rather than never.

In particular, pure evaluation could be implemented in a way that doesn't need to import everything into the store first, which would fix this.

@infinisil
Copy link
Member Author

So I guess it's not exclusively flake-specific, but flakes does cause this to trigger when you wouldn't expect it. But yes, this can also be fixed without involving flakes, fair enough.

@lf-
Copy link
Member

lf- commented Jan 3, 2024

I am working on hacks to work around this in nixpkgs, since it is required for NixOS/nixpkgs#254405

Related to the hacks I am doing, as a fallback: #5868

lf- added a commit to lf-/nixpkgs that referenced this issue Jan 20, 2024
…xt for self

This is an extremely subtle feature which is necessary to be able to
incur dependencies on this nixpkgs's source code without copying it
twice (NixOS/nix#9428,
NixOS/nix#5868).

pkgs.path is not sufficient, because actually using it incurs a copy,
which winds up looking like /nix/store/HASH-HASH-source in flakes.
Similarly, `toString pkgs.path` is not sufficient because that does not
have any string context, so it will not incur a dependency if it's used.
It's exceptionally subtle.

There are four cases:
- non flakes, pure mode: can't do anything about this, we must copy to
  the store.
- non flakes, not already in the store: can't do anything about this, we
  are copying to the store.
- non flakes, already in the store: storePath gives us a string with
  context for free.
- flakes: overlay makes it a stringification of self.outPath.

In all cases, this is a string with appropriate context to transfer this
nixpkgs to another system.
@infinisil
Copy link
Member Author

infinisil commented Feb 1, 2024

Edit: This isn't right! Please disregard this message, I'll investigate more.. Found it, very cursed, but unrelated to this issue: NixOS/nix.dev#830 (comment)

I just discovered that this is the reason derivations differ between native Flake and flake-compat builds before Nix 2.16, and the reason they don't differ afterwards

Observe:

$ nix build github:NixOS/nix/2.15-maintenance
$ nix-store -q --deriver result
/nix/store/2yw7lrx3r1f7k8b0gi7cm2769rikkkwj-nix-2.15.4.drv
$ nix-instantiate https://github.com/NixOS/nix/tarball/2.15-maintenance -A default
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/fpn8j70zy9fw29183mr8dijf4p1waglf-nix-2.15.4.drv

$ nix-diff /nix/store/2yw7lrx3r1f7k8b0gi7cm2769rikkkwj-nix-2.15.4.drv /nix/store/fpn8j70zy9fw29183mr8dijf4p1waglf-nix-2.15.4.drv
- /nix/store/2yw7lrx3r1f7k8b0gi7cm2769rikkkwj-nix-2.15.4.drv:{out}
+ /nix/store/fpn8j70zy9fw29183mr8dijf4p1waglf-nix-2.15.4.drv:{out}
• The set of input source names do not match:
    - source
    + yy3rf6i4iilb6jy7i0hh34m458zs6kss-source
• The environments do not match:
    src=''
    -/nix/store/yy3rf6i4iilb6jy7i0hh34m458zs6kss-source
    +/nix/store/iwgx944cpkyn5x8dj6zn2r6lbmcklglj-yy3rf6i4iilb6jy7i0hh34m458zs6kss-source
''

This isn't a problem with 2.16 because of #8096, which started using lib.cleanSourceWith on self.outPath, but using flake-compat, self.outPath is provided by fetchGit: https://github.com/edolstra/flake-compat/blob/35bb57c0c8d8b62bbfd284272c928ceb64ddbde9/default.nix#L96

And because fetchGit doesn't compose with lib.cleanSourceWith, this causes a double store import, making it line up exactly with Flakes double store importing, for entirely different reasons!

This is why in NixOS/nix.dev#830 (comment) we discovered that 2.13 doesn't get substituted, when 2.18 and 2.19 do.


Of course, not doing the double store import on the Flakes side will again cause the Flake builds to not line up with the flake-compat builds, unless flake-compat is adjusted as well, e.g. by using lib.cleanSourceWith itself.

@roberth
Copy link
Member

roberth commented May 20, 2024

Copying to the store when interpolating a path value is normal. What's not required is copying sources to the store before evaluating. This was done as an implementation detail of flakes to enable caching, but it is too crude a solution.

Lazy trees (#6530) solves the problem a the cost of correctness (specifically determinism, certainly in its current form).

An alternate solution is lazy paths (#10252), which is a partial implementation of lazy trees; the part that does not compromise correctness. It only extends path values to include virtual trees (just like lazy trees does), without implementing the virtual path strings that are causing the problems in lazy-trees.
By introducing these new path values, we neatly avoid having to change the semantics of existing constructs that are already in wide use, such as absolute path values that are under /${storeDir}.

This isn't completely airtight, although that's ok because users can opt out by choosing to copy to the store.
The reason why it's not perfect is that users (of flakes in particular) will try to run existing code that may not expect to run with a virtual path for its base directory.
However, virtual paths aren't even all that different from normal path values, so if those are supported (usually the are), they'll be fine. Only termination by x == /. needs work (probably x == dirOf x).

Note that we can bring the same functionality to non-flakes users in the form of a filterPath function that behaves much like path or filterSource, but returns a path value instead of a store path string.

@lf-
Copy link
Member

lf- commented May 20, 2024

fyi the filterSource API is very very bad. for example, it can't see targets of symlinks, among other things. See https://wiki.lix.systems/link/13, under "fix fs builtins".

I don't think we should actually introduce anything new that looks like filterSource, because filterSource is bad.

@roberth
Copy link
Member

roberth commented May 21, 2024

Another issue with filterSource (and its successor builtins.path) is that filterSource (path: type: path == ./. || path == ./foo) newRoot doesn't work.
I'd like the callback to be { path, relativePath, absolutePath, type, ... }: <...>.

  • path: a path value (ie non-strict subpath of newRoot)
  • relativePath: path from newRoot to path
  • absolutePath: path from / or virtual tree root to path
  • type: readFileType path
  • target: readSymlinkTarget path

The latter two are not strictly necessary because of path, but it's more similar to the filterSource interface, and the values could be returned lazily, like readDir does that.

@lf-
Copy link
Member

lf- commented May 21, 2024

We could force the formals of the lambda to accept new fields by having ellipsis. wiggles did this to the repl overlays feature in lix to force not painting ourselves into a corner by accepting user code that can't take us extending the interface.

I'm still not fully convinced by the idea of returning just a boolean either. We may want the ability to actually transform directory trees slightly in such a builtin, e.g. rewriting targets or perhaps injecting a little text file (but not dumping the whole everything in general).

This is somewhat bike shedding but I do want to be careful about introducing new builtins that don't contain api design flaws that paint us into any existing corner we know about, especially if they need to get into multiple implementations to be actually used.

@roberth
Copy link
Member

roberth commented May 21, 2024

Yes, exploring the design space is important to me too, and I think it's unfortunate whenever that's perceived as bikeshedding.
We don't get to reinvent our public interface over and over like it's the next javascript framework (although the frequent iterations might be more fun for us authors/maintainers).

We may want the ability to actually transform directory trees slightly

That's a great suggestion, if perhaps a bit humble; we could have some sort of toDirectory and make it possible to do the whole filtering in expression space without filterSource or any builtin like it.

Maybe we do want to have a specialized filtering function as a builtin nonetheless, but probably the best way to find out what its interface should be is to first build it on top of toDirectory so that experimentation can be done in expressions and doesn't become definitive until it's mature.

We could force the formals of the lambda to accept new fields by having ellipsis

Silvan did the same for fileset's fileFilter function, which is an interesting design in its own right by the way.

@infinisil
Copy link
Member Author

Yeah, @lf- please check out the lib.fileset library, it's based on builtins.path underneath, but removes all of the gotchas. See also NixOS/nixpkgs#271307. It's not as efficient as it could be right now, but #8820 would fix this without having to make changes to the design.

ryanccn pushed a commit to ryanccn/nrr that referenced this issue Aug 19, 2024
* nix: drop flake-utils

* nix: drop rust-overlay

* nix: don't re-instantiate nixpkgs
Previously, applying the overlay to our instance of nixpkgs added about
~100mb of memory to each evaluation, along with a few extra seconds to
evaluate (https://zimbatm.com/notes/1000-instances-of-nixpkgs)

* nix: use nix-filter to filter source
Avoids NixOS/nix#9428 (path coercion like `"${./.}"` causes files to be added
to the store twice)
@nyabinary
Copy link

While undesirable, this is not a bug, not something we can fix in a backward compatible way, and not flake-specific.

I mean flakes are still not stabilized no? So it def something that can be done considering their no stabilization guarantees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants