Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rawhide][x86_64] ext.config.var-mount.luks kola test failure #1836

Closed
aaradhak opened this issue Nov 19, 2024 · 12 comments
Closed

[rawhide][x86_64] ext.config.var-mount.luks kola test failure #1836

aaradhak opened this issue Nov 19, 2024 · 12 comments
Assignees
Labels
jira for syncing to jira kind/bug pipeline failure This issue or pull request is derived from CI failures

Comments

@aaradhak
Copy link
Member

aaradhak commented Nov 19, 2024

Describe the bug

The ext.config.var-mount.luks kola test failed in the latest rawhide build . The reason for the failure what that the machine entered emergency.target in the initramfs stage causing the test to fail.

kola test failure:

[2024-11-19T13:12:28.282Z] --- FAIL: ext.config.var-mount.luks (24.31s)
[2024-11-19T13:12:28.282Z]         harness.go:1823: mach.Start() failed: machine 28319516-ff0b-4c0f-a1b0-7d1d045204ae entered emergency.target in initramfs
[2024-11-19T13:12:28.282Z] FAIL, output in /home/jenkins/agent/workspace/build/tmp/kola-lf7Xi/kola/rerun
[2024-11-19T13:12:28.282Z] Error: harness: test suite failed
[2024-11-19T13:12:28.282Z] 2024-11-19T13:12:25Z cli: harness: test suite failed
[2024-11-19T13:12:28.282Z] failed to execute cmd-kola: exit status 1

This issue is found to occur after the clevis pkg upgrade from clevis-21-6.fc42 -> 21-7.fc42

clevis (21-6.fc42 → 21-7.fc42)
clevis-dracut (21-6.fc42 → 21-7.fc42)
clevis-luks (21-6.fc42 → 21-7.fc42)
clevis-systemd (21-6.fc42 → 21-7.fc42)

Console log:
From the console log, an ignition-disks.service failure seem to have occurred as the Clevis bind operation for setting up LUKS encryption failed due to missing tools in the environment

console.txt

[   16.662679] ignition[871]: disks: createLuks: op(b): [finished] opening luks device varlog
[   16.666771] ignition[871]: disks: createLuks: op(c): [started]  Clevis bind
[   18.695645] ignition[871]: disks: createLuks: op(c): [failed]   Clevis bind: exit status 1: Cmd: "clevis" "luks" "bind" "-f" "-k" "/tmp/ignition-luks-229972637" "-d" "/run/ignition/dev_aliases/dev/disk/by-partlabel/varlog" "sss" "{\"pins\":{\"tpm2\":{}},\"t\":1}" Stdout: "Warning: keyslot operation could fail as it requires more than available memory.\n" Stderr: "/usr/bin/clevis-encrypt-tpm2: line 137: tpm2_getcap: command not found\nUnable to find non-empty PCR algorithm bank, please check output of tpm2_getcap pcrs\nUnable to perform encryption with PIN 'sss' and config '{\"pins\":{\"tpm2\":{}},\"t\":1}'\nError adding new binding to /run/ignition/dev_aliases/dev/disk/by-partlabel/varlog\n"
�M
�[K[�[0;1;31mFAILED�[0m] Failed to start �[0;1;39mignition-disks.service�[0m - Ignition (disks).

�[KSee 'systemctl status ignition-disks.service' for details.

[�[0;1;38:5:185mDEPEND�[0m] Dependency failed for �[0;1;39mignition-complete.target�[0m - Ignition Complete.

[�[0;1;38:5:185mDEPEND�[0m] Dependency failed for �[0;1;39minitrd.target�[0m - Initrd Default Target.

[   18.713497] systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE
[   18.715874] ignition[871]: disks failed

Reproduction steps

git checkout rawhide in fedora-coreos-config
cosa fetch && cosa build
kola run ext.config.var-mount.luks

Expected behavior

ext.config.var-mount.luks test to pass

Actual behavior

ext.config.var-mount.luks test fails as the machine enters emergency.target in the initramfs

System details

[rawhide][x86_64]

Butane or Ignition config

No response

Additional information

No response

@aaradhak aaradhak added kind/bug pipeline failure This issue or pull request is derived from CI failures labels Nov 19, 2024
@dustymabe
Copy link
Member

any relevant logs from the console that indicate why we ended up in emergency.target?

@aaradhak
Copy link
Member Author

Override PR - coreos/fedora-coreos-config#3267

@aaradhak
Copy link
Member Author

any relevant logs from the console that indicate why we ended up in emergency.target?

I just updated the description with the relevant log information now. Looks like there was an ignition-disks.service failure caused by clevis bind operation for setting up LUKS encryption.

@aaradhak aaradhak changed the title [rawhide][x86_64]: ext.config.var-mount.luks kola test failure [rawhide][x86_64] ext.config.var-mount.luks kola test failure Nov 19, 2024
@aaradhak
Copy link
Member Author

Filed a bugzilla issue against clevis for this - https://bugzilla.redhat.com/show_bug.cgi?id=2327563

@jlebon
Copy link
Member

jlebon commented Nov 20, 2024

I think the fix for this is likely on our side. We probably need to add tpm2_getcap to the initrd. E.g. here: https://github.com/coreos/ignition/blob/7a20ab2b65d8d1e7f58f2205b09172a514734d59/dracut/30ignition/module-setup.sh#L49-L60

@aaradhak
Copy link
Member Author

aaradhak commented Nov 20, 2024

I can add that change.

@aaradhak
Copy link
Member Author

aaradhak commented Dec 3, 2024

The kola test ext.config.var-mount.luks seems to PASS when tested with the latest clevis pkg update clevis-21-8.fc42.x86_64 and removing the ignition changes. I believe we can unpin the clevis pkg and remove the ignition change that was done previously.

aaradhak added a commit to aaradhak/fedora-coreos-config that referenced this issue Dec 3, 2024
The kola test ext.config.var-mount.luks seems to PASS when tested with
the latest clevis pkg update clevis-21-8.fc42.x86_64. By unpinning this
clevis pkg, the latest clevis pkg could be fetched and tested.

Ref: coreos/fedora-coreos-tracker#1836
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Dec 3, 2024
The kola test ext.config.var-mount.luks seems to PASS when tested with
the latest clevis pkg update clevis-21-8.fc42.x86_64. By unpinning this
clevis pkg, the latest clevis pkg could be fetched and tested.

Ref: coreos/fedora-coreos-tracker#1836
@dustymabe
Copy link
Member

can you also test and unpin in testing-devel (F41)?

@aaradhak
Copy link
Member Author

aaradhak commented Dec 3, 2024

sure will do that

aaradhak added a commit to aaradhak/ignition that referenced this issue Dec 5, 2024
We are reverting the addition of tpm2_cap binary that was done as part
of the kola test failure ext.config.var-mount.luks caused by clevis pkg
upgrade from clevis-21-6.fc42 -> 21-7.fc42, as the issue is now fixed in
with clevis pkg update clevis-21-8.fc42.x86_64

Ref: coreos/fedora-coreos-tracker#1836 (comment)
@aaradhak
Copy link
Member Author

aaradhak commented Dec 5, 2024

Fast track PR of clevis-21-7.fc41 (testing-devel) - coreos/fedora-coreos-config#3298

@aaradhak
Copy link
Member Author

aaradhak commented Dec 5, 2024

PR to revert ignition change - coreos/ignition#1985

@aaradhak aaradhak self-assigned this Dec 9, 2024
@aaradhak
Copy link
Member Author

aaradhak commented Dec 9, 2024

This issue can be closed now.

@aaradhak aaradhak closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira kind/bug pipeline failure This issue or pull request is derived from CI failures
Projects
None yet
Development

No branches or pull requests

3 participants