Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input::fetchToStore(): Don't try to substitute #10612

Merged
merged 6 commits into from
Sep 16, 2024

Conversation

edolstra
Copy link
Member

Motivation

Having a narHash doesn't mean that we have the other attributes returned by the fetcher (such as lastModified or rev). For instance,

$ nix flake metadata github:NixOS/patchelf/7c2f768bf9601268a4e71c2ebe91e2011918a70f
Last modified: 2024-01-15 10:51:22

but

$ nix flake metadata github:NixOS/patchelf/7c2f768bf9601268a4e71c2ebe91e2011918a70f?narHash=sha256-PPXqKY2hJng4DBVE0I4xshv/vGLUskL7jl53roB8UdU%3D
(does not print a "Last modified")

The latter only happens if the store path already exists or is substitutable, which made this impure behaviour unpredictable.

Fixes #10601. Note: lazy-trees already had this fix, because it no longer requires all locked nodes to have a narHash.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Apr 26, 2024
Having a narHash doesn't mean that we have the other attributes
returned by the fetcher (such as lastModified or rev). For instance,

   $ nix flake metadata github:NixOS/patchelf/7c2f768bf9601268a4e71c2ebe91e2011918a70f
   Last modified: 2024-01-15 10:51:22

but

   $ nix flake metadata github:NixOS/patchelf/7c2f768bf9601268a4e71c2ebe91e2011918a70f?narHash=sha256-PPXqKY2hJng4DBVE0I4xshv/vGLUskL7jl53roB8UdU%3D
   (does not print a "Last modified")

The latter only happens if the store path already exists or is
substitutable, which made this impure behaviour unpredictable.

Fixes NixOS#10601.
@edolstra edolstra force-pushed the no-flake-substitution branch from a1920a0 to ff107d9 Compare April 26, 2024 14:41
@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Apr 26, 2024
@roberth
Copy link
Member

roberth commented Apr 26, 2024

(thoughts)

I'd still be concerned about similar bugs, because the code isn't structured in a way that would make such a bug somewhat harder to write.
We have the requirement that

  • Given any input to fetchTree, it must only succeed with one possible return value - ie referential transparency in the happy path.

This crucial property is hard to test, so we should try to structure the code in a way that makes this property easy to determine (I won't say prove; too absolute to be realistic).
A mutable map makes that hard.

In my previous attempt to improve this, I focused on giving types to the input side, which wasn't all that fruitful, given the amount of change and effort required. Maybe it'd be more effective to put the output into a struct before returning it as attributes?

@roberth
Copy link
Member

roberth commented Apr 26, 2024

Is it necessary not to substitute? What if the the missing fields can be obtained from the lock file instead?
I suppose we'd have to tell fetchTree not to return any new attributes except outPath. IIRC we don't have such a flag yet?

@edolstra
Copy link
Member Author

Substitution could still be done for locked flake inputs, because there we know that we have all the attributes.

We do have the isLocked() method, but we can't use it for the tarball fetcher, since the HTTP headers can return additional lock attributes like lastModified and rev.

@roberth
Copy link
Member

roberth commented Apr 26, 2024

but we can't use it for the tarball fetcher, since the HTTP headers can return additional lock attributes like lastModified and rev.

Aren't those saved to the lock then?

@edolstra
Copy link
Member Author

They are, this problem only occurs on the CLI if you pass narHash but not the other attributes.

@edolstra
Copy link
Member Author

edolstra commented Apr 26, 2024

I guess part of the issue is the removal of hasAllInfo(), though even before that the tarball fetcher had this issue (since its hasAllInfo() just returned true).

Regardless, correctness is more important than performance, and the substitution shortcut makes it harder to reason about the correctness of fetchers, so it's probably best to get rid of it for now.

@roberth
Copy link
Member

roberth commented Apr 26, 2024

Fair enough. Some optimizations can be done later, unless we move towards a non-optimizable design. I don't think that's the case here.

EDIT: Some optimizations

@Ericson2314
Copy link
Member

I would like to keep substituting, because I would like to be able to substitute from things like Software Heritage by being able to look up content addresses too and not just store paths.

@roberth
Copy link
Member

roberth commented May 18, 2024

Right, this isn't actually just an optimization.
Another use case is where a (private) cache is used for requireFile-like functionality.

If we move the metadata into a separate attrset, we can both fix the problem and keep supporting these use cases.
This can be done with a derivation/strictDerivation-like arrangement, where fetchTree is a builtin expression that calls the primops lazily.

Relative path flakes ("subflakes") are basically fundamentally
broken, since they produce lock file entries like

  "locked": {
    "lastModified": 1,
    "narHash": "sha256-/2tW9SKjQbRLzfcJs5SHijli6l3+iPr1235zylGynK8=",
    "path": "./flakeC",
    "type": "path"
  },

that don't specify what "./flakeC" is relative to. They *sometimes*
worked by accident because the `narHash` field allowed
`fetchToStore()` to get the store path of the subflake *if* it
happened to exist in the local store or in a substituter.

Subflakes are properly fixed in NixOS#10089 (which adds a "parent" field to
the lock file). Rather than come up with some crazy hack to make them
work in the interim, let's just disable the only test that depends on
the broken behaviour for now.
@edolstra
Copy link
Member Author

edolstra commented Sep 11, 2024

Since this is an actual correctness bug (the evaluation result can change depending on whether the flake store path exists in a substituter/local store), which is more important than an optimization, let's merge this. We can figure out how to bring back substitution in the future, if anybody cares. (However, substitution requires having a narHash for locks, which is somewhat incompatible with lazy trees.)

I did have to disable a test that uses relative path flakes because they're completely broken, but this was masked by the fetchToStore() behaviour (see 12fd65d). This will be fixed in #10089.

@edolstra edolstra merged commit 176334d into NixOS:master Sep 16, 2024
11 checks passed
@edolstra edolstra deleted the no-flake-substitution branch September 16, 2024 10:47
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-are-flake-dependencies-fetched/52638/9

@pwaller
Copy link
Contributor

pwaller commented Sep 26, 2024

We can figure out how to bring back substitution in the future, if anybody cares.

Have I understood correctly that this means flake inputs will no longer substitute from the cache? If so...

I care! I am tracking things in a substituter which cease to exist at their original urls and require the cache, and so users of my flakes will not be able to build them anymore with this change.

It also sounded to me from the above thread that both @Ericson2314 and @roberth care too:

I would like to keep substituting, because I would like to be able to substitute from things like Software Heritage by being able to look up content addresses too and not just store paths.

Right, this isn't actually just an optimization. Another use case is where a (private) cache is used for requireFile-like functionality.

@minijackson
Copy link
Member

I also have a similar usecase, where I use a cache server as an insurance that I can rebuild potentially really old projects. I would like very much that this solution can work transparently for both derivations and flake inputs.

@a-h
Copy link
Contributor

a-h commented Sep 27, 2024

I think I also need flake inputs to be able to continue to collect source code from a binary cache.

My understanding is that during flake evaluation, Flake inputs are collected and evaluated, in order to work out the hashes to download binaries from the cache.

Those flake inputs are often libraries, and are coming from github etc., e.g. github:hercules-ci/gitignore.nix or github:nix-community/gomod2nix so they're needed to evaluate the flake outputs.

In an offline environment (e.g. airgapped, or on a train running through the countryside), flake inputs can't simply be re-collected from the public Internet.

Today, in my offline environments, I have a binary cache that I populate with an export of everything required to build some software products. I'm then able to change the source code and rebuild locally using Nix, without connecting to the Internet.

If I understand this change, and the behaviour of Nix flake evaluation, it sounds like this might break that behaviour?

@roberth
Copy link
Member

roberth commented Sep 27, 2024

Not super happy about this merge, and apparently I'm not the only one.
Substitution is useful, and this bugfix is a regression for us.

I think we could fix the fix by adding back substitution as a fallback only (considering the performance argument mentioned elsewhere). The extra attributes would either have to be read from the lock file (and trusted, which is ok), or they should result in a lazy error that only triggers on those attributes. We already support multiple return types for attributes; may as well add a variant for an evaluation error (translated to mkApp(throw, msg)).

Note: lazy-trees already had this fix, because it no longer requires all locked nodes to have a narHash.

We don't need to remove narHash in order to copy sources to the store lazily. We can remove it as a side effect of trusting libfetchers to produce the right thing.
Removing narHash saves us from having to compute it during locking, but I don't think this is significant, except in the case of building a dirty Git workdir, in which case we could just choose not to compute it.
The narHash computation could be a step before writing out the lock file, and if we don't expose this hash to the language, its omission from dirty Git inputs is not observable to the evaluation of this temporary source.

@a-h

My understanding is that during flake evaluation, Flake inputs are collected and evaluated, in order to work out the hashes to download binaries from the cache.

This is how it happens during locking. Once locked, inputs are loaded on demand only using builtins.fetchTree (called internally, lazily).

@pwaller
Copy link
Contributor

pwaller commented Sep 27, 2024

This is how it happens during locking. Once locked, inputs are loaded on demand only using builtins.fetchTree (called internally, lazily).

(Digression) Except if you refer to the outPath of the flake input, in which case #9570 happens; which means a flake input doesn't quite behave the same as a src for a derivation with respect to cache hits. AFAIU, The current behaviour source gets fetched during the computation of a derivation using the outPath, either way, when ideally it would only be fetched if you actually needed to use files within. [unless this has been fixed by lazy trees or something? I'm unsure if I am uptodate on this]

@roberth
Copy link
Member

roberth commented Sep 27, 2024

Except if you refer to the outPath of the flake input,

Any attribute, or probably even isAttrs inputs.something, especially when not flake = false; is sufficient to force the fetchTree call.
My point was that you don't need to fetch sources that aren't referenced in your evaluation (e.g. something for packages.b when you only ask for .#a and there's no dependency on b).

Switching to fixed output derivations (as also suggested in #9570 as referenced) may solve the problem for you, but at the cost of needing more tooling, and it only works for inputs where you can put flake = false;. We wouldn't want you to do "import from derivation", which would combine the sequentialization (like builtin fetchers) with the performance overhead of fixed-output fetchers, all during instantiation. While probably slower, it is a viable workaround that allows for substitution though.

@roberth
Copy link
Member

roberth commented Sep 27, 2024

But yeah, back to the topic, I think we need to

  • Add a fallback on substitution
  • Make sure the attrs in question are in the lock node (ie fetchTree argument)
    • Return a lazy error when they're not
  • Later: handle the optional need for narHash gracefully in the context of virtual, "lazy" sources

edolstra added a commit to DeterminateSystems/nix-src that referenced this pull request Oct 15, 2024
The ability to substitute inputs was removed in NixOS#10612 because it was
broken: with user-specified inputs containing a `narHash` attribute,
substitution resulted in an input that lacked the attributes returned
by the real fetcher (such as `lastModified`).

To fix this, we introduce a new input attribute `final`. If `final =
true`, fetching the input cannot add or change any attributes.

We only attempt to substitute inputs that have `final = true`. This is
implied by lock file entries; we only write a lock file if all its
entries are "final".

The user can specified `final = true` in `fetchTree`, in which case it
is their responsibility to ensure that all attributes returned by the
fetcher are included in the `fetchTree` call. For example,

  nix eval --impure --expr 'builtins.fetchTree { type = "github"; owner = "NixOS"; repo = "patchelf"; final = true; narHash = "sha256-FSoxTcRZMGHNJh8dNtKOkcUtjhmhU6yQXcZZfUPLhQM="; }'

succeeds in a store path with the specified NAR hash exists or is
substitutable, but fails with

  error: fetching final input '{"final":true,"narHash":"sha256-FSoxTcRZMGHNJh8dNtKOkcUtjhmhU6yQXcZZfUPLhQM=","owner":"NixOS","repo":"patchelf","type":"github"}' resulted in different input '{"final":true,"lastModified":1718457448,"narHash":"sha256-FSoxTcRZMGHNJh8dNtKOkcUtjhmhU6yQXcZZfUPLhQM=","owner":"NixOS","repo":"patchelf","rev":"a0f54334df36770b335c051e540ba40afcbf8378","type":"github"}'
@edolstra
Copy link
Member Author

See #11701 which brings back the ability to substitute inputs in a safer way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect lastModified value
7 participants