Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgresql_14: fix build #368091

Merged

Conversation

wolfgangwalther
Copy link
Contributor

Building postgresql_14 currently fails with this on master:

error: derivation contains an illegal reference specifier 'man'

The odd thing is, that changing anything in the postInstall phase seems to fix this, including just adding a single character as a comment.

I have no clue what's happening here.

I don't have too much time the next couple of days to investigate this.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@kirillrdy
Copy link
Member

@wolfgangwalther postgresql_14 builds for me on master

commit 87060a022a39438d025221bb6848fa2e4ffb113d (HEAD -> master, origin/master, origin/HEAD)
Merge: 083902dc0105 a36a61d30840
Author: nixpkgs-merge-bot[bot] <148217876+nixpkgs-merge-bot[bot]@users.noreply.github.com>
Date:   Wed Dec 25 11:07:03 2024 +0000

    shopware-cli: 0.4.61 -> 0.4.62 (#367898)


[kirillvr@osaka:~/nixpkgs]$ nix-build -A postgresql_14
/nix/store/m9vb40xxr6gckjzpfxnqcmjqsks2gx03-postgresql-14.15

can you provide more info ? perhaps create an issue ?

@wolfgangwalther
Copy link
Contributor Author

postgresql_14 builds for me on master

Uh, that's really odd. For me, it can't be found in the binary cache, but for you it seems it can be? We'd have to look up hydra logs to see whether the same failure happened there. In any case, I had the same failure appear in a CI job elsewhere, so it's not only related to my system.

This smells like a nix bug. Which version of nix are you running, @kirillrdy?

I have:

nix (Lix, like Nix) 2.91.0

@wolfgangwalther
Copy link
Contributor Author

Also the hash for me is different when building this on master:

nix-build -A postgresql_14 --dry-run
this derivation will be built:
  /nix/store/bxp6g57limvwiga61vdlyvhy7i8rp6wd-postgresql-14.15.drv

@kirillrdy which system are you building on?

@kirillrdy
Copy link
Member

[kirillvr@osaka:~]$ nix-info -m
 - system: `"x86_64-linux"`
 - host os: `Linux 6.12.6, NixOS, 25.05 (Warbler), 25.05.20241223.cde85e7`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.24.11`
 - nixpkgs: `/nix/store/a4x7j50qdyq62p31kw0n2p37xs94lb37-source`

also for me, it wasn't in a cache, i had to build it

@wolfgangwalther
Copy link
Contributor Author

"x86_64-linux"

Same for me. I don't understand why my hash is different.

@wolfgangwalther
Copy link
Contributor Author

Same for me. I don't understand why my hash is different.

🙈 ah, it was the .drv file for me, ofc. Can you check the hash of your .drv file?

@kirillrdy
Copy link
Member

try flakes

nix build github:nixos/nixpkgs/87060a022a39438d025221bb6848fa2e4ffb113d#postgresql_14

i get same output hash

[kirillvr@osaka:~/nixpkgs]$ ls -lah result
lrwxrwxrwx 1 kirillvr users 60 Dec 25 20:25 result -> /nix/store/m9vb40xxr6gckjzpfxnqcmjqsks2gx03-postgresql-14.15

@kirillrdy
Copy link
Member

 nix eval github:nixos/nixpkgs/87060a022a39438d025221bb6848fa2e4ffb113d#postgresql_14
«derivation /nix/store/bxp6g57limvwiga61vdlyvhy7i8rp6wd-postgresql-14.15.drv»

@Ma27
Copy link
Member

Ma27 commented Dec 25, 2024

lolwat 😅

Does this affect v14 only or other versions as well?
Is this a new thing? Because 2.91.1 exists for a while now.

Would prefer to spend a little more time to investigate this. I don't think I'll get to much during the holidays though.

@wolfgangwalther
Copy link
Contributor Author

Does this affect v14 only or other versions as well?

Only v14 :D

Is this a new thing? Because 2.91.1 exists for a while now.

I haven't bisected neither nixpkgs nor nix, so I can't tell when it started to appear. I just realized when trying to update PostgREST's CI, which also runs postgresql 14.

@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Dec 25, 2024

Assuming that it's a nix bug, we currently have this:

  • broken on lix 2.91.0 (wolfgangwalther)
  • broken on nix 2.3.18 (wolfgangwalther)
  • broken on nix 2.20.8 (wolfgangwalther)
  • broken on nix 2.21.4 (wolfgangwalther)
  • broken on nix 2.22.3 (wolfgangwalther)
  • broken on nix 2.23.3 (wolfgangwalther)
  • broken on nix 2.24.9 (wolfgangwalther / PostgREST CI)
  • broken on nix 2.24.10 (vcunat)
  • working on nix 2.24.11 (kirillrdy)
  • broken on nix 2.24.11 (wolfgangwalther)
  • broken on nix 2.25.3 (wolfgangwalther)

It started failing in hydra here: https://hydra.nixos.org/build/281720533

The logs here don't show anything helpful: https://hydra.nixos.org/build/281937414 - essentially they are cut-off right before the error that I get.

@ofborg ofborg bot added 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 501+ 10.rebuild-linux: 501+ labels Dec 25, 2024
@vcunat
Copy link
Member

vcunat commented Dec 29, 2024

The error is very odd, but rebuilding all postgresql versions on master would be relatively expensive, as labeled by CI.

@vcunat
Copy link
Member

vcunat commented Dec 29, 2024

Looking at https://hydra.nixos.org/eval/1810750?filter=postgresql\_ – the darwin errors seem different, so I think only postgresql_14.x86_64-linux is affected.

I think it would be fine to merge this if conditional, e.g. for version 14 on any platform. Though it would be nice to better understand when/why this happens.

@wolfgangwalther wolfgangwalther marked this pull request as draft December 29, 2024 17:11
@wolfgangwalther
Copy link
Contributor Author

The error is very odd, but rebuilding all postgresql versions on master would be relatively expensive, as labeled by CI.

Yes, absolutely. We'd have to let this go through staging or add a conditional as you say.. but I'd really like to understand what's happening, if only remotely so, first.

@wolfgangwalther
Copy link
Contributor Author

bisecting only lead me to #361878 so far, not anything more precise.

@wolfgangwalther
Copy link
Contributor Author

working on nix 2.24.11 (kirillrdy)

I have the same build failure with nix 2.24.11 as well (both with and without flakes). No idea why it succeeded to build for @kirillrdy.

@vcunat
Copy link
Member

vcunat commented Dec 29, 2024

I'm doing a full bisect now.

@kirillrdy
Copy link
Member

working on nix 2.24.11 (kirillrdy)

I have the same build failure with nix 2.24.11 as well (both with and without flakes). No idea why it succeeded to build for @kirillrdy.

@wolfgangwalther have you tried pure build using flakes ? and do you still get same output hash as me ? or different ?

@kirillrdy
Copy link
Member

ok, so now I get same build failure, and I can not reproduce build using same nixos rev and nixpkgs rev 🤷

@bendlas
Copy link
Contributor

bendlas commented Dec 29, 2024

I was able to fix the build with this commit: bendlas@a468e26

also relevant: NixOS/nix#12105

@vcunat
Copy link
Member

vcunat commented Dec 29, 2024

c207a2b is the first bad commit

It's an automatic merge. I retested again that it really fails on this commit and succeeds on both parents.

I don't really feel closer to the real source of the issue, but maybe someone else will have an idea.

@wolfgangwalther
Copy link
Contributor Author

I was able to fix the build with this commit: bendlas@a468e26

And I can fix the build with the commit in this PR - just changing a bash comment. Thus, I'm not sure whether the removal of references is really necessary here or fixing the actual issue.

@wolfgangwalther
Copy link
Contributor Author

c207a2b is the first bad commit

It's an automatic merge. I retested again that it really fails on this commit and succeeds on both parents.

I don't really feel closer to the real source of the issue, but maybe someone else will have an idea.

When I revert the commit in #363710, the build succeeds for me.

@wolfgangwalther
Copy link
Contributor Author

I was able to fix the build with this commit: bendlas@a468e26

And I can fix the build with the commit in this PR - just changing a bash comment. Thus, I'm not sure whether the removal of references is really necessary here or fixing the actual issue.

To double-check, I built postgresql_14 with the lib.disallowedReferences check for "man" removed. I can't find any reference to the man output in the lib output.

But: When I remove the `lib.disallowedReferences for doc (and keep man), then the build succeeds...

So it's really: Anything that changes the derivation in any way... makes the build succeed. Another random example: I changed the order of two configure flags - the build succeeds.

@vcunat
Copy link
Member

vcunat commented Dec 30, 2024

Sometimes this happened when a sloppy regex (typically in upstream build scripts) matched a part of the /nix/store hash.

@Ma27
Copy link
Member

Ma27 commented Jan 1, 2025

I think I found the issue: if an output is already registered (man output here, not entirely sure why yet), Nix removes the otuput from the list of outputs that need reference checks. That's reasonable, but this exact list is also used to check if an output has references to another output (lib -> man).

This also explains why @kirillrdy ran into this issue after a succeeding build I think: the first time the man output wasn't registered and thus this code-path was never reached.

I'll try to come up with a patch for Lix. I guess it should be relatively simple to port to CppNix then, this part doesn't seem to have too many Lix-specific changes.

If my theory is correct, this patch is also not helping: it simply works now because man with this patch isn't registered. But I'll try to verify this. I'm not sure how to move forward though. Best idea I have is backport aggressively and drop support for broken Nix versions. Somehow our recent interactions always converge to increasing nixpkgs' minver @wolfgangwalther ;-)

@ElvishJerricco
Copy link
Contributor

Yea, I had this issue as well with Nix 2.24.10, and running nix-store --delete on all of postgresql_14's output paths, and then building with --substituters ""' resulted in a successful build for me.

@Ma27
Copy link
Member

Ma27 commented Jan 2, 2025

This matches my observation that substituting the man otuput only reliably triggers the issue (given that the other outputs don't seem to be on cache.nixos.org).
I don't know how this could've happened, but I don't think that's really relevant currently since the behavior itself is clearly wrong regardless.

@wolfgangwalther
Copy link
Contributor Author

I think I found the issue: if an output is already registered (man output here, not entirely sure why yet), Nix removes the otuput from the list of outputs that need reference checks. That's reasonable, but this exact list is also used to check if an output has references to another output (lib -> man).

[...]

If my theory is correct, this patch is also not helping: it simply works now because man with this patch isn't registered.

In that case, a proper fix for now would be to add outputChecks for the other outputs, including man, too, right?

But I'll try to verify this. I'm not sure how to move forward though. Best idea I have is backport aggressively and drop support for broken Nix versions. Somehow our recent interactions always converge to increasing nixpkgs' minver ;-)

Yeah... I guess that's the only way to do it. I assume this bug hasn't been noticed so far, because postgresql is literally the first derivation in nixpkgs to use outputChecks. If we want to go forward with structuredAttrs, we'd need to make sure that all supported nix versions do work properly with outputChecks.

Building postgresql_14 currently fails with this on master:

  error: derivation contains an illegal reference specifier 'man'

The reason seems to be a bug in nix, where outputChecks are run
improperly when one of the outputs can already be substituted. Why the
man output can be substituted from hydra is unknown, but adding more
outputChecks for the the man and doc outputs should work around the
problem until nix is fixed.
@wolfgangwalther wolfgangwalther force-pushed the postgresql-14-fix-build branch from 96ecb88 to a0919c7 Compare January 2, 2025 09:32
@wolfgangwalther
Copy link
Contributor Author

In that case, a proper fix for now would be to add outputChecks for the other outputs, including man, too, right?

I pushed a commit doing that for v14 only. We can merge this to master, and then remove the conditional through staging.

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Jan 2, 2025
@wolfgangwalther wolfgangwalther merged commit 5aabe24 into NixOS:master Jan 2, 2025
25 of 27 checks passed
@wolfgangwalther wolfgangwalther deleted the postgresql-14-fix-build branch January 2, 2025 18:57
@Ma27
Copy link
Member

Ma27 commented Jan 4, 2025

Apologies for my delayed responses, have quite much to do on multiple ends currently.

First of all, for those who are interested, feel free to review https://gerrit.lix.systems/c/lix/+/2346, this should fix the underlying bug.

In that case, a proper fix for now would be to add outputChecks for the other outputs, including man, too, right?

I'm afraid I don't follow: if we ever get into the situation that only a part of the postgresql store-paths are cached on cache.nixos.org, the list of outputs to check would be incomplete again because the current implementation only adds the output-paths to it that are not registered already in your local store.

I may be missing something, but I'm not sure if we can do much here, actually.
The one end is answering why the cache is incomplete (perhaps a restart of Hydra while it was busy uploading postgresl store-paths to S3: for non-structuredAttrs drv this was never an issue because you need structuredAttrs+outputChecks to trigger the bug, in that case derivations would probably be built again and the missing paths uploaded then). But that's just a theory.

Don't get me wrong, I'd be very happy to be proven wrong here!
Because the conclusion I draw from my thoughts is what I stated already: backport the fix to as many CppNix versions as possible. And drop support for all versions that don't receive patches in nixpkgs.

@wolfgangwalther
Copy link
Contributor Author

I'm afraid I don't follow: if we ever get into the situation that only a part of the postgresql store-paths are cached on cache.nixos.org, the list of outputs to check would be incomplete again because the current implementation only adds the output-paths to it that are not registered already in your local store.

I see - that was probably a mis-understanding on my side. I had hoped/assumed that because outputChecks for those outputs would need to be run, they would need to be added to the list as well. (I have not looked at your fix, yet)

But what you're saying implies that those outputChecks had already been run before and are not run again for substituted outputs - which makes sense.

So yeah, the "fix" merged here doesn't seem to be better than any other changes we could have made to the derivation. In any case, those additional outputChecks are not hurting, so I think we can keep them / proceed with them on staging.

Because the conclusion I draw from my thoughts is what I stated already: backport the fix to as many CppNix versions as possible. And drop support for all versions that don't receive patches in nixpkgs.

Yeah, there seems to be no other way.

@vcunat
Copy link
Member

vcunat commented Jan 4, 2025

The thing is that I believe it's not rare to have a year old nix (or even older). So I'd say we're mainly "lucky" that no really important package seems affected so far.

@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Jan 4, 2025

So I'd say we're mainly "lucky" that no really important package seems affected so far.

No, this bug can only surface when a derivation uses structuredAttrs and outputChecks. postgresql is the only one in nixpkgs doing that, so far. So this happened actually fairly quickly: We introduced the structuredAttrs+outputChecks in August of last year and a few months later we hit this bug - with only a single package using those.

But if we really enable structuredAttrs by default eventually, this needs to be fixed in all supported nix versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 101-500 10.rebuild-linux: 101-500 11.by: package-maintainer This PR was created by the maintainer of the package it changes 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants