forked from systemd/systemd
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
12316 lines (10054 loc) · 651 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
systemd System and Service Manager
CHANGES WITH 248:
* A concept of system extension images is introduced. Such images may
be used to extend the /usr/ and /opt/ directory hierarchies at
runtime with additional files (even if the file system is read-only).
When a system extension image is activated, its /usr/ and /opt/
hierarchies and os-release information are combined via overlayfs
with the file system hierarchy of the host OS.
A new systemd-sysext tool can be used to merge, unmerge, list, and
refresh system extension hierarchies. See
https://www.freedesktop.org/software/systemd/man/systemd-sysext.html.
The systemd-sysext.service automatically merges installed system
extensions during boot (before basic.target, but not in very early
boot, since various file systems have to be mounted first).
The SYSEXT_LEVEL= field in os-release(5) may be used to specify the
supported system extension level.
* A new ExtensionImages= unit setting can be used to apply the same
system extension image concept from systemd-sysext to the namespaced
file hierarchy of specific services, following the same rules and
constraints.
* Support for a new special "root=tmpfs" kernel command-line option has
been added. When specified, a tmpfs is mounted on /, and mount.usr=
should be used to point to the operating system implementation.
* A new configuration file /etc/veritytab may be used to configure
dm-verity integrity protection for block devices. Each line is in the
format "volume-name data-device hash-device roothash options",
similar to /etc/crypttab.
* A new kernel command-line option systemd.verity.root_options= may be
used to configure dm-verity behaviour for the root device.
* The key file specified in /etc/crypttab (the third field) may now
refer to an AF_UNIX/SOCK_STREAM socket in the file system. The key is
acquired by connecting to that socket and reading from it. This
allows the implementation of a service to provide key information
dynamically, at the moment when it is needed.
* When the hostname is set explicitly to "localhost", systemd-hostnamed
will respect this. Previously such a setting would be mostly silently
ignored. The goal is to honour configuration as specified by the
user.
* The fallback hostname that will be used by the system manager and
systemd-hostnamed can now be configured in two new ways: by setting
DEFAULT_HOSTNAME= in os-release(5), or by setting
$SYSTEMD_DEFAULT_HOSTNAME in the environment block. As before, it can
also be configured during compilation. The environment variable is
intended for testing and local overrides, the os-release(5) field is
intended to allow customization by different variants of a
distribution that share the same compiled packages.
* The environment block of the manager itself may be configured through
a new ManagerEnvironment= setting in system.conf or user.conf. This
complements existing ways to set the environment block (the kernel
command line for the system manager, the inherited environment and
[email protected] unit file settings for the user manager).
* systemd-hostnamed now exports the default hostname and the source of
the configured hostname ("static", "transient", or "default") as
D-Bus properties.
* systemd-hostnamed now exports the "HardwareVendor" and
"HardwareModel" D-Bus properties, which are supposed to contain a
pair of cleaned up, human readable strings describing the system's
vendor and model. It's typically sourced from the firmware's DMI
tables, but may be augmented from a new hwdb database. hostnamectl
shows this in the status output.
* Support has been added to systemd-cryptsetup for extracting the
PKCS#11 token URI and encrypted key from the LUKS2 JSON embedded
metadata header. This allows the information how to open the
encrypted device to be embedded directly in the device and obviates
the need for configuration in an external file.
* systemd-cryptsetup gained support for unlocking LUKS2 volumes using
TPM2 hardware, as well as FIDO2 security tokens (in addition to the
pre-existing support for PKCS#11 security tokens).
* systemd-repart may enroll encrypted partitions using TPM2
hardware. This may be useful for example to create an encrypted /var
partition bound to the machine on first boot.
* A new systemd-cryptenroll tool has been added to enroll TPM2, FIDO2
and PKCS#11 security tokens to LUKS volumes, list and destroy
them. See:
http://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html
It also supports enrolling "recovery keys" and regular passphrases.
* The libfido2 dependency is now based on dlopen(), so that the library
is used at runtime when installed, but is not a hard runtime
dependency.
* systemd-cryptsetup gained support for two new options in
/etc/crypttab: "no-write-workqueue" and "no-read-workqueue" which
request synchronous processing of encryption/decryption IO.
* The manager may be configured at compile time to use the fexecve()
instead of the execve() system call when spawning processes. Using
fexecve() closes a window between checking the security context of an
executable and spawning it, but unfortunately the kernel displays
stale information in the process' "comm" field, which impacts ps
output and such.
* The configuration option -Dcompat-gateway-hostname has been dropped.
"_gateway" is now the only supported name.
* The ConditionSecurity=tpm2 unit file setting may be used to check if
the system has at least one TPM2 (tpmrm class) device.
* A new ConditionCPUFeature= has been added that may be used to
conditionalize units based on CPU features. For example,
ConditionCPUFeature=rdrand will condition a unit so that it is only
run when the system CPU supports the RDRAND opcode.
* The existing ConditionControlGroupController= setting has been
extended with two new values "v1" and "v2". "v2" means that the
unified v2 cgroup hierarchy is used, and "v1" means that legacy v1
hierarchy or the hybrid hierarchy are used.
* A new PrivateIPC= setting on a unit file allows executed processes to
be moved into a private IPC namespace, with separate System V IPC
identifiers and POSIX message queues.
A new IPCNamespacePath= allows the unit to be joined to an existing
IPC namespace.
* The tables of system calls in seccomp filters are now automatically
generated from kernel lists exported on
https://fedora.juszkiewicz.com.pl/syscalls.html.
The following architectures should now have complete lists:
alpha, arc, arm64, arm, i386, ia64, m68k, mips64n32, mips64, mipso32,
powerpc, powerpc64, s390, s390x, tilegx, sparc, x86_64, x32.
* The MountAPIVFS= service file setting now additionally mounts a tmpfs
on /run/ if it is not already a mount point. A writable /run/ has
always been a requirement for a functioning system, but this was not
guaranteed when using a read-only image.
Users can always specify BindPaths= or InaccessiblePaths= as
overrides, and they will take precedence. If the host's root mount
point is used, there is no change in behaviour.
* New bind mounts and file system image mounts may be injected into the
mount namespace of a service (without restarting it). This is exposed
respectively as 'systemctl bind <unit> <path>…' and
'systemctl mount-image <unit> <image>…'.
* The StandardOutput= and StandardError= settings can now specify files
to be truncated for output (as "truncate:<path>").
* The ExecPaths= and NoExecPaths= settings may be used to specify
noexec for parts of the file system.
* sd-bus has a new function sd_bus_open_user_machine() to open a
connection to the session bus of a specific user in a local container
or on the local host. This is exposed in the existing -M switch to
systemctl and similar tools:
systemctl --user -M lennart@foobar start foo
This will connect to the user bus of a user "lennart" in container
"foobar". If no container name is specified, the specified user on
the host itself is connected to
systemctl --user -M lennart@ start quux
* sd-bus also gained a convenience function sd_bus_message_send() to
simplify invocations of sd_bus_send(), taking only a single
parameter: the message to send.
* sd-event allows rate limits to be set on event sources, for dealing
with high-priority event sources that might starve out others. See
the new man page sd_event_source_set_ratelimit(3) for details.
* systemd.link files gained a [Link] Promiscuous= switch, which allows
the device to be raised in promiscuous mode.
New [Link] TransmitQueues= and ReceiveQueues= settings allow the
number of TX and RX queues to be configured.
New [Link] TransmitQueueLength= setting allows the size of the TX
queue to be configured.
New [Link] GenericSegmentOffloadMaxBytes= and
GenericSegmentOffloadMaxSegments= allow capping the packet size and
the number of segments accepted in Generic Segment Offload.
* systemd-networkd gained support for the "B.A.T.M.A.N. advanced"
wireless routing protocol that operates on ISO/OSI Layer 2 only and
uses ethernet frames to route/bridge packets. This encompasses a new
"batadv" netdev Type=, a new [BatmanAdvanced] section with a bunch of
new settings in .netdev files, and a new BatmanAdvanced= setting in
.network files.
* systemd.network files gained a [Network] RouteTable= configuration
switch to select the routing policy table.
systemd.network files gained a [RoutingPolicyRule] Type=
configuration switch (one of "blackhole, "unreachable", "prohibit").
systemd.network files gained a [IPv6AcceptRA] RouteDenyList= and
RouteAllowList= settings to ignore/accept route advertisements from
routers matching specified prefixes. The DenyList= setting has been
renamed to PrefixDenyList= and a new PrefixAllowList= option has been
added.
systemd.network files gained a [DHCPv6] UseAddress= setting to
optionally ignore the address provided in the lease.
systemd.network files gained a [DHCPv6PrefixDelegation]
ManageTemporaryAddress= switch.
systemd.network files gained a new ActivationPolicy= setting which
allows configuring how the UP state of an interface shall be managed,
i.e. whether the interface is always upped, always downed, or may be
upped/downed by the user using "ip link set dev".
* The default for the Broadcast= setting in .network files has slightly
changed: the broadcast address will not be configured for wireguard
devices.
* systemd.netdev files gained a [VLAN] Protocol=, IngressQOSMaps=,
EgressQOSMaps=, and [MACVLAN] BroadcastMulticastQueueLength=
configuration options for VLAN packet handling.
* udev rules may now set log_level= option. This allows debug logs to
be enabled for select events, e.g. just for a specific subsystem or
even a single device.
* udev now exports the VOLUME_ID, LOGICAL_VOLUME_ID, VOLUME_SET_ID, and
DATA_PREPARED_ID properties for block devices with ISO9660 file
systems.
* udev now exports decoded DMI information about installed memory slots
as device properties under the /sys/class/dmi/id/ pseudo device.
* /dev/ is not mounted noexec anymore. This didn't provide any
significant security benefits and would conflict with the executable
mappings used with /dev/sgx device nodes. The previous behaviour can
be restored for individual services with NoExecPaths=/dev (or by allow-
listing and excluding /dev from ExecPaths=).
* Permissions for /dev/vsock are now set to 0o666, and /dev/vhost-vsock
and /dev/vhost-net are owned by the kvm group.
* The hardware database has been extended with a list of fingerprint
readers that correctly support USB auto-suspend using data from
libfprint.
* systemd-resolved can now answer DNSSEC questions through the stub
resolver interface in a way that allows local clients to do DNSSEC
validation themselves. For a question with DO+CD set, it'll proxy the
DNS query and respond with a mostly unmodified packet received from
the upstream server.
* systemd-resolved learnt a new boolean option CacheFromLocalhost= in
resolved.conf. If true the service will provide caching even for DNS
lookups made to an upstream DNS server on the 127.0.0.1/::1
addresses. By default (and when the option is false) systemd-resolved
will not cache such lookups, in order to avoid duplicate local
caching, under the assumption the local upstream server caches
anyway.
* systemd-resolved now implements RFC5001 NSID in its local DNS
stub. This may be used by local clients to determine whether they are
talking to the DNS resolver stub or a different DNS server.
* When resolving host names and other records resolvectl will now
report where the data was acquired from (i.e. the local cache, the
network, locally synthesized, …) and whether the network traffic it
effected was encrypted or not. Moreover the tool acquired a number of
new options --cache=, --synthesize=, --network=, --zone=,
--trust-anchor=, --validate= that take booleans and may be used to
tweak a lookup, i.e. whether it may be answered from cached
information, locally synthesized information, information acquired
through the network, the local mDNS/LLMNR zone, the DNSSEC trust
anchor, and whether DNSSEC validation shall be executed for the
lookup.
* systemd-nspawn gained a new --ambient-capability= setting
(AmbientCapability= in .nspawn files) to configure ambient
capabilities passed to the container payload.
* systemd-nspawn gained the ability to configure the firewall using the
nftables subsystem (in addition to the existing iptables
support). Similarly, systemd-networkd's IPMasquerade= option now
supports nftables as back-end, too. In both cases NAT on IPv6 is now
supported too, in addition to IPv4 (the iptables back-end still is
IPv4-only).
"IPMasquerade=yes", which was the same as "IPMasquerade=ipv4" before,
retains its meaning, but has been deprecated. Please switch to either
"ivp4" or "both" (if covering IPv6 is desired).
* systemd-importd will now download .verity and .roothash.p7s files
along with the machine image (as exposed via machinectl pull-raw).
* systemd-oomd now gained a new DefaultMemoryPressureDurationSec=
setting to configure the time a unit's cgroup needs to exceed memory
pressure limits before action will be taken, and a new
ManagedOOMPreference=none|avoid|omit setting to avoid killing certain
units.
systemd-oomd is now considered fully supported (the usual
backwards-compatiblity promises apply). Swap is not required for
operation, but it is still recommended.
* systemd-timesyncd gained a new ConnectionRetrySec= setting which
configures the retry delay when trying to contact servers.
* systemd-stdio-bridge gained --system/--user options to connect to the
system bus (previous default) or the user session bus.
* systemd-localed may now call locale-gen to generate missing locales
on-demand (UTF-8-only). This improves integration with Debian-based
distributions (Debian/Ubuntu/PureOS/Tanglu/...) and Arch Linux.
* systemctl --check-inhibitors=true may now be used to obey inhibitors
even when invoked non-interactively. The old --ignore-inhibitors
switch is now deprecated and replaced by --check-inhibitors=false.
* systemctl import-environment will now emit a warning when called
without any arguments (i.e. to import the full environment block of
the called program). This command will usually be invoked from a
shell, which means that it'll inherit a bunch of variables which are
specific to that shell, and usually to the TTY the shell is connected
to, and don't have any meaning in the global context of the system or
user service manager. Instead, only specific variables should be
imported into the manager environment block.
Similarly, programs which update the manager environment block by
directly calling the D-Bus API of the manager, should also push
specific variables, and not the full inherited environment.
* systemctl's status output now shows unit state with a more careful
choice of Unicode characters: units in maintenance show a "○" symbol
instead of the usual "●", failed units show "×", and services being
reloaded "↻".
* coredumpctl gained a --debugger-arguments= switch to pass arguments
to the debugger. It also gained support for showing coredump info in
a simple JSON format.
* systemctl/loginctl/machinectl's --signal= option now accept a special
value "list", which may be used to show a brief table with known
process signals and their numbers.
* networkctl now shows the link activation policy in status.
* Various tools gained --pager/--no-pager/--json= switches to
enable/disable the pager and provide JSON output.
* Various tools now accept two new values for the SYSTEMD_COLORS
environment variable: "16" and "256", to configure how many terminal
colors are used in output.
* less 568 or newer is now required for the auto-paging logic of the
various tools. Hyperlink ANSI sequences in terminal output are now
used even if a pager is used, and older versions of less are not able
to display these sequences correctly. SYSTEMD_URLIFY=0 may be used to
disable this output again.
* Builds with support for separate / and /usr/ hierarchies ("split-usr"
builds, non-merged-usr builds) are now officially deprecated. A
warning is emitted during build. Support is slated to be removed in
about a year (when the Debian Bookworm release development starts).
* Systems with the legacy cgroup v1 hierarchy are now marked as
"tainted", to make it clearer that using the legacy hierarchy is not
recommended.
* systemd-localed will now refuse to configure a keymap which is not
installed in the file system. This is intended as a bug fix, but
could break cases where systemd-localed was used to configure the
keymap in advanced of it being installed. It is necessary to install
the keymap file first.
* The main git development branch has been renamed to 'main'.
* mmcblk[0-9]boot[0-9] devices will no longer be probed automatically
for partitions, as in the vast majority of cases they contain none
and are used internally by the bootloader (eg: uboot).
* systemd will now set the $SYSTEMD_EXEC_PID environment variable for
spawned processes to the PID of the process itself. This may be used
by programs for detecting whether they were forked off by the service
manager itself or are a process forked off further down the tree.
* The sd-device API gained four new calls: sd_device_get_action() to
determine the uevent add/remove/change/… action the device object has
been seen for, sd_device_get_seqno() to determine the uevent sequence
number, sd_device_new_from_stat_rdev() to allocate a new sd_device
object from stat(2) data of a device node, and sd_device_trigger() to
write to the 'uevent' attribute of a device.
* For most tools the --no-legend= switch has been replaced by
--legend=no and --legend=yes, to force whether tables are shown with
headers/legends.
* Units acquired a new property "Markers" that takes a list of zero,
one or two of the following strings: "needs-reload" and
"needs-restart". These markers may be set via "systemctl
set-property". Once a marker is set, "systemctl reload-or-restart
--marked" may be invoked to execute the operation the units are
marked for. This is useful for package managers that want to mark
units for restart/reload while updating, but effect the actual
operations at a later step at once.
* The sd_bus_message_read_strv() API call of sd-bus may now also be
used to parse arrays of D-Bus signatures and D-Bus paths, in addition
to regular strings.
* bootctl will now report whether the UEFI firmware used a TPM2 device
and measured the boot process into it.
* systemd-tmpfiles learnt support for a new environment variable
$SYSTEMD_TMPFILES_FORCE_SUBVOL which takes a boolean value. If true
the v/q/Q lines in tmpfiles.d/ snippets will create btrfs subvolumes
even if the root fs of the system is not itself a btrfs volume.
* systemd-detect-virt/ConditionVirtualization= will now explicitly
detect Docker/Podman environments where possible. Moreover, they
should be able to generically detect any container manager as long as
it assigns the container a cgroup.
* portablectl gained a new "reattach" verb for detaching/reattaching a
portable service image, useful for updating images on-the-fly.
* Intel SGX enclave device nodes (which expose a security feature of
newer Intel CPUs) will now be owned by a new system group "sgx".
Contributions from: Adam Nielsen, Adrian Vovk, AJ Jordan, Alan Perry,
Alastair Pharo, Alexander Batischev, Ali Abdallah, Andrew Balmos,
Anita Zhang, Annika Wickert, Ansgar Burchardt, Antonio Terceiro,
Antonius Frie, Ardy, Arian van Putten, Ariel Fermani, Arnaud T,
A S Alam, Bastien Nocera, Benjamin Berg, Benjamin Robin, Björn Daase,
caoxia, Carlo Wood, Charles Lee, ChopperRob, chri2, Christian Ehrhardt,
Christian Hesse, Christopher Obbard, clayton craft, corvusnix, cprn,
Daan De Meyer, Daniele Medri, Daniel Rusek, Dan Sanders, Dan Streetman,
Darren Ng, David Edmundson, David Tardon, Deepak Rawat, Devon Pringle,
Dmitry Borodaenko, dropsignal, Einsler Lee, Endre Szabo,
Evgeny Vereshchagin, Fabian Affolter, Fangrui Song, Felipe Borges,
feliperodriguesfr, Felix Stupp, Florian Hülsmann, Florian Klink,
Florian Westphal, Franck Bui, Frantisek Sumsal, Gablegritule,
Gaël PORTAY, Gaurav, Giedrius Statkevičius, Greg Depoire-Ferrer,
Gustavo Costa, Hans de Goede, Hela Basa, heretoenhance, hide,
Iago López Galeiras, igo95862, Ilya Dmitrichenko, Jameer Pathan,
Jan Tojnar, Jiehong, Jinyuan Si, Joerg Behrmann, John Slade,
Jonathan G. Underwood, Jonathan McDowell, Josh Triplett, Joshua Watt,
Julia Cartwright, Julien Humbert, Kairui Song, Karel Zak,
Kevin Backhouse, Kevin P. Fleming, Khem Raj, Konomi, krissgjeng,
l4gfcm, Lajos Veres, Lennart Poettering, Lincoln Ramsay, Luca Boccassi,
Luca BRUNO, Lucas Werkmeister, Luka Kudra, Luna Jernberg,
Marc-André Lureau, Martin Wilck, Matthias Klumpp, Matt Turner,
Michael Gisbers, Michael Marley, Michael Trapp, Michal Fabik,
Michał Kopeć, Michal Koutný, Michal Sekletár, Michele Guerini Rocco,
Mike Gilbert, milovlad, moson-mo, Nick, nihilix-melix, Oğuz Ersen,
Ondrej Mosnacek, pali, Pavel Hrdina, Pavel Sapezhko, Perry Yuan,
Peter Hutterer, Pierre Dubouilh, Piotr Drąg, Pjotr Vertaalt,
Richard Laager, RussianNeuroMancer, Sam Lunt, Sebastiaan van Stijn,
Sergey Bugaev, shenyangyang4, simmon, Simonas Kazlauskas,
Slimane Selyan Amiri, Stefan Agner, Steve Ramage, Susant Sahani,
Sven Mueller, Tad Fisher, Takashi Iwai, Thomas Haller, Tom Shield,
Topi Miettinen, Torsten Hilbrich, tpgxyz, Tyler Hicks, ulf-f,
Ulrich Ölmann, Vincent Pelletier, Vinnie Magro, Vito Caputo, Vlad,
walbit-de, Whired Planck, wouter bolsterlee, Xℹ Ruoyao, Yangyang Shen,
Yuri Chornoivan, Yu Watanabe, Zach Smith, Zbigniew Jędrzejewski-Szmek,
Zmicer Turok, Дамјан Георгиевски
— Berlin, 2021-03-30
CHANGES WITH 247:
* KERNEL API INCOMPATIBILITY: Linux 4.14 introduced two new uevents
"bind" and "unbind" to the Linux device model. When this kernel
change was made, systemd-udevd was only minimally updated to handle
and propagate these new event types. The introduction of these new
uevents (which are typically generated for USB devices and devices
needing a firmware upload before being functional) resulted in a
number of issues which we so far didn't address. We hoped the kernel
maintainers would themselves address these issues in some form, but
that did not happen. To handle them properly, many (if not most) udev
rules files shipped in various packages need updating, and so do many
programs that monitor or enumerate devices with libudev or sd-device,
or otherwise process uevents. Please note that this incompatibility
is not fault of systemd or udev, but caused by an incompatible kernel
change that happened back in Linux 4.14, but is becoming more and
more visible as the new uevents are generated by more kernel drivers.
To minimize issues resulting from this kernel change (but not avoid
them entirely) starting with systemd-udevd 247 the udev "tags"
concept (which is a concept for marking and filtering devices during
enumeration and monitoring) has been reworked: udev tags are now
"sticky", meaning that once a tag is assigned to a device it will not
be removed from the device again until the device itself is removed
(i.e. unplugged). This makes sure that any application monitoring
devices that match a specific tag is guaranteed to both see uevents
where the device starts being relevant, and those where it stops
being relevant (the latter now regularly happening due to the new
"unbind" uevent type). The udev tags concept is hence now a concept
tied to a *device* instead of a device *event* — unlike for example
udev properties whose lifecycle (as before) is generally tied to a
device event, meaning that the previously determined properties are
forgotten whenever a new uevent is processed.
With the newly redefined udev tags concept, sometimes it's necessary
to determine which tags are the ones applied by the most recent
uevent/database update, in order to discern them from those
originating from earlier uevents/database updates of the same
device. To accommodate for this a new automatic property CURRENT_TAGS
has been added that works similar to the existing TAGS property but
only lists tags set by the most recent uevent/database
update. Similarly, the libudev/sd-device API has been updated with
new functions to enumerate these 'current' tags, in addition to the
existing APIs that now enumerate the 'sticky' ones.
To properly handle "bind"/"unbind" on Linux 4.14 and newer it is
essential that all udev rules files and applications are updated to
handle the new events. Specifically:
• All rule files that currently use a header guard similar to
ACTION!="add|change",GOTO="xyz_end" should be updated to use
ACTION=="remove",GOTO="xyz_end" instead, so that the
properties/tags they add are also applied whenever "bind" (or
"unbind") is seen. (This is most important for all physical device
types — those for which "bind" and "unbind" are currently
generated, for all other device types this change is still
recommended but not as important — but certainly prepares for
future kernel uevent type additions).
• Similarly, all code monitoring devices that contains an 'if' branch
discerning the "add" + "change" uevent actions from all other
uevents actions (i.e. considering devices only relevant after "add"
or "change", and irrelevant on all other events) should be reworked
to instead negatively check for "remove" only (i.e. considering
devices relevant after all event types, except for "remove", which
invalidates the device). Note that this also means that devices
should be considered relevant on "unbind", even though conceptually
this — in some form — invalidates the device. Since the precise
effect of "unbind" is not generically defined, devices should be
considered relevant even after "unbind", however I/O errors
accessing the device should then be handled gracefully.
• Any code that uses device tags for deciding whether a device is
relevant or not most likely needs to be updated to use the new
udev_device_has_current_tag() API (or sd_device_has_current_tag()
in case sd-device is used), to check whether the tag is set at the
moment an uevent is seen (as opposed to the existing
udev_device_has_tag() API which checks if the tag ever existed on
the device, following the API concept redefinition explained
above).
We are very sorry for this breakage and the requirement to update
packages using these interfaces. We'd again like to underline that
this is not caused by systemd/udev changes, but result of a kernel
behaviour change.
* UPCOMING INCOMPATIBILITY: So far most downstream distribution
packages have not retriggered devices once the udev package (or any
auxiliary package installing additional udev rules) is updated. We
intend to work with major distributions to change this, so that
"udevadm trigger -a change" is issued on such upgrades, ensuring that
the updated ruleset is applied to the devices already discovered, so
that (asynchronously) after the upgrade completed the udev database
is consistent with the updated rule set. This means udev rules must
be ready to be retriggered with a "change" action any time, and
result in correct and complete udev database entries. While the
majority of udev rule files known to us currently get this right,
some don't. Specifically, there are udev rules files included in
various packages that only set udev properties on the "add" action,
but do not handle the "change" action. If a device matching those
rules is retriggered with the "change" action (as is intended here)
it would suddenly lose the relevant properties. This always has been
problematic, but as soon as all udev devices are triggered on relevant
package upgrades this will become particularly so. It is strongly
recommended to fix offending rules so that they can handle a "change"
action at any time, and acquire all necessary udev properties even
then. Or in other words: the header guard mentioned above
(ACTION=="remove",GOTO="xyz_end") is the correct approach to handle
this, as it makes sure rules are rerun on "change" correctly, and
accumulate the correct and complete set of udev properties. udev rule
definitions that cannot handle "change" events being triggered at
arbitrary times should be considered buggy.
* The MountAPIVFS= service file setting now defaults to on if
RootImage= and RootDirectory= are used, which means that with those
two settings /proc/, /sys/ and /dev/ are automatically properly set
up for services. Previous behaviour may be restored by explicitly
setting MountAPIVFS=off.
* Since PAM 1.2.0 (2015) configuration snippets may be placed in
/usr/lib/pam.d/ in addition to /etc/pam.d/. If a file exists in the
latter it takes precedence over the former, similar to how most of
systemd's own configuration is handled. Given that PAM stack
definitions are primarily put together by OS vendors/distributions
(though possibly overridden by users), this systemd release moves its
own PAM stack configuration for the "systemd-user" PAM service (i.e.
for the PAM session invoked by the per-user [email protected] instance)
from /etc/pam.d/ to /usr/lib/pam.d/. We recommend moving all
packages' vendor versions of their PAM stack definitions from
/etc/pam.d/ to /usr/lib/pam.d/, but if such OS-wide migration is not
desired the location to which systemd installs its PAM stack
configuration may be changed via the -Dpamconfdir Meson option.
* The runtime dependencies on libqrencode, libpcre2, libidn/libidn2,
libpwquality and libcryptsetup have been changed to be based on
dlopen(): instead of regular dynamic library dependencies declared in
the binary ELF headers, these libraries are now loaded on demand
only, if they are available. If the libraries cannot be found the
relevant operations will fail gracefully, or a suitable fallback
logic is chosen. This is supposed to be useful for general purpose
distributions, as it allows minimizing the list of dependencies the
systemd packages pull in, permitting building of more minimal OS
images, while still making use of these "weak" dependencies should
they be installed. Since many package managers automatically
synthesize package dependencies from ELF shared library dependencies,
some additional manual packaging work has to be done now to replace
those (slightly downgraded from "required" to "recommended" or
whatever is conceptually suitable for the package manager). Note that
this change does not alter build-time behaviour: as before the
build-time dependencies have to be installed during build, even if
they now are optional during runtime.
* sd-event.h gained a new call sd_event_add_time_relative() for
installing timers relative to the current time. This is mostly a
convenience wrapper around the pre-existing sd_event_add_time() call
which installs absolute timers.
* sd-event event sources may now be placed in a new "exit-on-failure"
mode, which may be controlled via the new
sd_event_source_get_exit_on_failure() and
sd_event_source_set_exit_on_failure() functions. If enabled, any
failure returned by the event source handler functions will result in
exiting the event loop (unlike the default behaviour of just
disabling the event source but continuing with the event loop). This
feature is useful to set for all event sources that define "primary"
program behaviour (where failure should be fatal) in contrast to
"auxiliary" behaviour (where failure should remain local).
* Most event source types sd-event supports now accept a NULL handler
function, in which case the event loop is exited once the event
source is to be dispatched, using the userdata pointer — converted to
a signed integer — as exit code of the event loop. Previously this
was supported for IO and signal event sources already. Exit event
sources still do not support this (simply because it makes little
sense there, as the event loop is already exiting when they are
dispatched).
* A new per-unit setting RootImageOptions= has been added which allows
tweaking the mount options for any file system mounted as effect of
the RootImage= setting.
* Another new per-unit setting MountImages= has been added, that allows
mounting additional disk images into the file system tree accessible
to the service.
* Timer units gained a new FixedRandomDelay= boolean setting. If
enabled, the random delay configured with RandomizedDelaySec= is
selected in a way that is stable on a given system (though still
different for different units).
* Socket units gained a new setting Timestamping= that takes "us", "ns"
or "off". This controls the SO_TIMESTAMP/SO_TIMESTAMPNS socket
options.
* systemd-repart now generates JSON output when requested with the new
--json= switch.
* systemd-machined's OpenMachineShell() bus call will now pass
additional policy metadata data fields to the PolicyKit
authentication request.
* systemd-tmpfiles gained a new -E switch, which is equivalent to
--exclude-prefix=/dev --exclude-prefix=/proc --exclude=/run
--exclude=/sys. It's particularly useful in combination with --root=,
when operating on OS trees that do not have any of these four runtime
directories mounted, as this means no files below these subtrees are
created or modified, since those mount points should probably remain
empty.
* systemd-tmpfiles gained a new --image= switch which is like --root=,
but takes a disk image instead of a directory as argument. The
specified disk image is mounted inside a temporary mount namespace
and the tmpfiles.d/ drop-ins stored in the image are executed and
applied to the image. systemd-sysusers similarly gained a new
--image= switch, that allows the sysusers.d/ drop-ins stored in the
image to be applied onto the image.
* Similarly, the journalctl command also gained an --image= switch,
which is a quick one-step solution to look at the log data included
in OS disk images.
* journalctl's --output=cat option (which outputs the log content
without any metadata, just the pure text messages) will now make use
of terminal colors when run on a suitable terminal, similarly to the
other output modes.
* JSON group records now support a "description" string that may be
used to add a human-readable textual description to such groups. This
is supposed to match the user's GECOS field which traditionally
didn't have a counterpart for group records.
* The "systemd-dissect" tool that may be used to inspect OS disk images
and that was previously installed to /usr/lib/systemd/ has now been
moved to /usr/bin/, reflecting its updated status of an officially
supported tool with a stable interface. It gained support for a new
--mkdir switch which when combined with --mount has the effect of
creating the directory to mount the image to if it is missing
first. It also gained two new commands --copy-from and --copy-to for
copying files and directories in and out of an OS image without the
need to manually mount it. It also acquired support for a new option
--json= to generate JSON output when inspecting an OS image.
* The cgroup2 file system is now mounted with the
"memory_recursiveprot" mount option, supported since kernel 5.7. This
means that the MemoryLow= and MemoryMin= unit file settings now apply
recursively to whole subtrees.
* systemd-homed now defaults to using the btrfs file system — if
available — when creating home directories in LUKS volumes. This may
be changed with the DefaultFileSystemType= setting in homed.conf.
It's now the default file system in various major distributions and
has the major benefit for homed that it can be grown and shrunk while
mounted, unlike the other contenders ext4 and xfs, which can both be
grown online, but not shrunk (in fact xfs is the technically most
limited option here, as it cannot be shrunk at all).
* JSON user records managed by systemd-homed gained support for
"recovery keys". These are basically secondary passphrases that can
unlock user accounts/home directories. They are computer-generated
rather than user-chosen, and typically have greater entropy.
homectl's --recovery-key= option may be used to add a recovery key to
a user account. The generated recovery key is displayed as a QR code,
so that it can be scanned to be kept in a safe place. This feature is
particularly useful in combination with systemd-homed's support for
FIDO2 or PKCS#11 authentication, as a secure fallback in case the
security tokens are lost. Recovery keys may be entered wherever the
system asks for a password.
* systemd-homed now maintains a "dirty" flag for each LUKS encrypted
home directory which indicates that a home directory has not been
deactivated cleanly when offline. This flag is useful to identify
home directories for which the offline discard logic did not run when
offlining, and where it would be a good idea to log in again to catch
up.
* systemctl gained a new parameter --timestamp= which may be used to
change the style in which timestamps are output, i.e. whether to show
them in local timezone or UTC, or whether to show µs granularity.
* Alibaba's "pouch" container manager is now detected by
systemd-detect-virt, ConditionVirtualization= and similar
constructs. Similar, they now also recognize IBM PowerVM machine
virtualization.
* systemd-nspawn has been reworked to use the /run/host/incoming/ as
place to use for propagating external mounts into the
container. Similarly /run/host/notify is now used as the socket path
for container payloads to communicate with the container manager
using sd_notify(). The container manager now uses the
/run/host/inaccessible/ directory to place "inaccessible" file nodes
of all relevant types which may be used by the container payload as
bind mount source to over-mount inodes to make them inaccessible.
/run/host/container-manager will now be initialized with the same
string as the $container environment variable passed to the
container's PID 1. /run/host/container-uuid will be initialized with
the same string as $container_uuid. This means the /run/host/
hierarchy is now the primary way to make host resources available to
the container. The Container Interface documents these new files and
directories:
https://systemd.io/CONTAINER_INTERFACE
* Support for the "ConditionNull=" unit file condition has been
deprecated and undocumented for 6 years. systemd started to warn
about its use 1.5 years ago. It has now been removed entirely.
* sd-bus.h gained a new API call sd_bus_error_has_names(), which takes
a sd_bus_error struct and a list of error names, and checks if the
error matches one of these names. It's a convenience wrapper that is
useful in cases where multiple errors shall be handled the same way.
* A new system call filter list "@known" has been added, that contains
all system calls known at the time systemd was built.
* Behaviour of system call filter allow lists has changed slightly:
system calls that are contained in @known will result in a EPERM by
default, while those not contained in it result in ENOSYS. This
should improve compatibility because known system calls will thus be
communicated as prohibited, while unknown (and thus newer ones) will
be communicated as not implemented, which hopefully has the greatest
chance of triggering the right fallback code paths in client
applications.
* "systemd-analyze syscall-filter" will now show two separate sections
at the bottom of the output: system calls known during systemd build
time but not included in any of the filter groups shown above, and
system calls defined on the local kernel but known during systemd
build time.
* If the $SYSTEMD_LOG_SECCOMP=1 environment variable is set for
systemd-nspawn all system call filter violations will be logged by
the kernel (audit). This is useful for tracking down system calls
invoked by container payloads that are prohibited by the container's
system call filter policy.
* If the $SYSTEMD_SECCOMP=0 environment variable is set for
systemd-nspawn (and other programs that use seccomp) all seccomp
filtering is turned off.
* Two new unit file settings ProtectProc= and ProcSubset= have been
added that expose the hidepid= and subset= mount options of procfs.
All processes of the unit will only see processes in /proc that are
are owned by the unit's user. This is an important new sandboxing
option that is recommended to be set on all system services. All
long-running system services that are included in systemd itself set
this option now. This option is only supported on kernel 5.8 and
above, since the hidepid= option supported on older kernels was not a
per-mount option but actually applied to the whole PID namespace.
* Socket units gained a new boolean setting FlushPending=. If enabled
all pending socket data/connections are flushed whenever the socket
unit enters the "listening" state, i.e. after the associated service
exited.
* The unit file setting NUMAMask= gained a new "all" value: when used,
all existing NUMA nodes are added to the NUMA mask.
* A new "credentials" logic has been added to system services. This is
a simple mechanism to pass privileged data to services in a safe and
secure way. It's supposed to be used to pass per-service secret data
such as passwords or cryptographic keys but also associated less
private information such as user names, certificates, and similar to
system services. Each credential is identified by a short user-chosen
name and may contain arbitrary binary data. Two new unit file
settings have been added: SetCredential= and LoadCredential=. The
former allows setting a credential to a literal string, the latter
sets a credential to the contents of a file (or data read from a
user-chosen AF_UNIX stream socket). Credentials are passed to the
service via a special credentials directory, one file for each
credential. The path to the credentials directory is passed in a new
$CREDENTIALS_DIRECTORY environment variable. Since the credentials
are passed in the file system they may be easily referenced in
ExecStart= command lines too, thus no explicit support for the
credentials logic in daemons is required (though ideally daemons
would look for the bits they need in $CREDENTIALS_DIRECTORY
themselves automatically, if set). The $CREDENTIALS_DIRECTORY is
backed by unswappable memory if privileges allow it, immutable if
privileges allow it, is accessible only to the service's UID, and is
automatically destroyed when the service stops.
* systemd-nspawn supports the same credentials logic. It can both
consume credentials passed to it via the aforementioned
$CREDENTIALS_DIRECTORY protocol as well as pass these credentials on
to its payload. The service manager/PID 1 has been updated to match
this: it can also accept credentials from the container manager that
invokes it (in fact: any process that invokes it), and passes them on
to its services. Thus, credentials can be propagated recursively down
the tree: from a system's service manager to a systemd-nspawn
service, to the service manager that runs as container payload and to
the service it runs below. Credentials may also be added on the
systemd-nspawn command line, using new --set-credential= and
--load-credential= command line switches that match the
aforementioned service settings.
* systemd-repart gained new settings Format=, Encrypt=, CopyFiles= in
the partition drop-ins which may be used to format/LUKS
encrypt/populate any created partitions. The partitions are
encrypted/formatted/populated before they are registered in the
partition table, so that they appear atomically: either the
partitions do not exist yet or they exist fully encrypted, formatted,
and populated — there is no time window where they are
"half-initialized". Thus the system is robust to abrupt shutdown: if
the tool is terminated half-way during its operations on next boot it
will start from the beginning.
* systemd-repart's --size= operation gained a new "auto" value. If
specified, and operating on a loopback file it is automatically sized
to the minimal size the size constraints permit. This is useful to
use "systemd-repart" as an image builder for minimally sized images.
* systemd-resolved now gained a third IPC interface for requesting name
resolution: besides D-Bus and local DNS to 127.0.0.53 a Varlink
interface is now supported. The nss-resolve NSS module has been
modified to use this new interface instead of D-Bus. Using Varlink
has a major benefit over D-Bus: it works without a broker service,
and thus already during earliest boot, before the dbus daemon has
been started. This means name resolution via systemd-resolved now
works at the same time systemd-networkd operates: from earliest boot
on, including in the initrd.
* systemd-resolved gained support for a new DNSStubListenerExtra=
configuration file setting which may be used to specify additional IP
addresses the built-in DNS stub shall listen on, in addition to the
main one on 127.0.0.53:53.
* Name lookups issued via systemd-resolved's D-Bus and Varlink
interfaces (and thus also via glibc NSS if nss-resolve is used) will
now honour a trailing dot in the hostname: if specified the search
path logic is turned off. Thus "resolvectl query foo." is now
equivalent to "resolvectl query --search=off foo.".
* systemd-resolved gained a new D-Bus property "ResolvConfMode" that
exposes how /etc/resolv.conf is currently managed: by resolved (and
in which mode if so) or another subsystem. "resolvctl" will display
this property in its status output.
* The resolv.conf snippets systemd-resolved provides will now set "."
as the search domain if no other search domain is known. This turns
off the derivation of an implicit search domain by nss-dns for the
hostname, when the hostname is set to an FQDN. This change is done to
make nss-dns using resolv.conf provided by systemd-resolved behave
more similarly to nss-resolve.
* systemd-tmpfiles' file "aging" logic (i.e. the automatic clean-up of
/tmp/ and /var/tmp/ based on file timestamps) now looks at the
"birth" time (btime) of a file in addition to the atime, mtime, and
ctime.
* systemd-analyze gained a new verb "capability" that lists all known
capabilities by the systemd build and by the kernel.
* If a file /usr/lib/clock-epoch exists, PID 1 will read its mtime and
advance the system clock to it at boot if it is noticed to be before
that time. Previously, PID 1 would only advance the time to an epoch
time that is set during build-time. With this new file OS builders
can change this epoch timestamp on individual OS images without
having to rebuild systemd.
* systemd-logind will now listen to the KEY_RESTART key from the Linux
input layer and reboot the system if it is pressed, similarly to how
it already handles KEY_POWER, KEY_SUSPEND or KEY_SLEEP. KEY_RESTART
was originally defined in the Multimedia context (to restart playback
of a song or film), but is now primarily used in various embedded
devices for "Reboot" buttons. Accordingly, systemd-logind will now
honour it as such. This may configured in more detail via the new
HandleRebootKey= and RebootKeyIgnoreInhibited=.
* systemd-nspawn/systemd-machined will now reconstruct hardlinks when
copying OS trees, for example in "systemd-nspawn --ephemeral",
"systemd-nspawn --template=", "machinectl clone" and similar. This is
useful when operating with OSTree images, which use hardlinks heavily
throughout, and where such copies previously resulting in "exploding"
hardlinks.
* systemd-nspawn's --console= setting gained support for a new
"autopipe" value, which is identical to "interactive" when invoked on
a TTY, and "pipe" otherwise.
* systemd-networkd's .network files gained support for explicitly
configuring the multicast membership entries of bridge devices in the
[BridgeMDB] section. It also gained support for the PIE queuing
discipline in the [FlowQueuePIE] sections.
* systemd-networkd's .netdev files may now be used to create "BareUDP"
tunnels, configured in the new [BareUDP] setting.
* systemd-networkd's Gateway= setting in .network files now accepts the
special values "_dhcp4" and "_ipv6ra" to configure additional,
locally defined, explicit routes to the gateway acquired via DHCP or
IPv6 Router Advertisements. The old setting "_dhcp" is deprecated,
but still accepted for backwards compatibility.
* systemd-networkd's [IPv6PrefixDelegation] section and
IPv6PrefixDelegation= options have been renamed as [IPv6SendRA] and
IPv6SendRA= (the old names are still accepted for backwards
compatibility).
* systemd-networkd's .network files gained the DHCPv6PrefixDelegation=
boolean setting in [Network] section. If enabled, the delegated prefix
gained by another link will be configured, and an address within the
prefix will be assigned.
* systemd-networkd's .network files gained the Announce= boolean setting
in [DHCPv6PrefixDelegation] section. When enabled, the delegated
prefix will be announced through IPv6 router advertisement (IPv6 RA).
The setting is enabled by default.
* VXLAN tunnels may now be marked as independent of any underlying
network interface via the new Independent= boolean setting.
* systemctl gained support for two new verbs: "service-log-level" and
"service-log-target" may be used on services that implement the
generic org.freedesktop.LogControl1 D-Bus interface to dynamically
adjust the log level and target. All of systemd's long-running
services support this now, but ideally all system services would
implement this interface to make the system more uniformly
debuggable.