Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost subshell performance by avoiding expensive pwd related syscalls #805

Merged
merged 2 commits into from
Jan 3, 2025

Conversation

JohnoKing
Copy link

In ksh93v- a sh.pwdfd variable was introduced, ostensibly for implementing cd -f, pwd -f and ${.sh.pwdfd} as new features. I might backport these in the future, but those aren't the focus of this commit. Rather, what caught my interest was the effect of this change on subshell performance (results from shbench with 20 iterations, compiled with CCFLAGS='-O2 -fPIC'):

----------------------------------------------------------------------------------------------------------------------------------
name           /tmp/ksh93u+2012-08-01  /tmp/ksh93u+m-dev       /tmp/ksh93v-2012-08-24  /tmp/ksh93v-2014-12-24  /tmp/ksh93u+redhat
----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.230 [0.227-0.235]     0.216 [0.211-0.222]     0.269 [0.266-0.273]     0.247 [0.242-0.251]     0.264 [0.260-0.269]
parens.ksh     0.208 [0.204-0.212]     0.173 [0.170-0.182]     0.147 [0.145-0.149]     0.125 [0.121-0.127]     0.140 [0.137-0.144]
----------------------------------------------------------------------------------------------------------------------------------

As can be seen above, 93u+ and 93u+m perform worse in the parens benchmark than the ksh93v- release which implements the change, as well as a copy of ksh93u+ compiled with Red Hat's patches for cd (fdstatus, rmdirfix, cdfix and cdfix2 applied in that order). However, 93u+ and 93u+m perform better in the fibonacci benchmark.

93v- is faster in the parens benchmark because it obtains the file descriptor for the current working directory via openat(2) in sh_diropenat(), then caches it in sh.pwdfd. This fd is then reused in sh_subshell(), which allows 93v- to avoid taxing open(2) and fcntl(2) syscalls.

However, the 93v- implementation has a flaw. Unlike in 93u+, it always duplicates sh.pwdfd with fcntl(2), even when it's a superfluous operation. This is the cause of the performance regression in the fibonacci benchmark.

To increase ksh93u+m's performance beyond both 93u+ and 93v-, I have backported 93v-'s code, but with a crucial change:

  • sh_subshell() no longer duplicates sh.pwdfd with fcntl() during virtual subshell scope initialization. Rather, a sh_pwdupdate() function is used in tandem with sh_diropenat() in b_cd() to do all of this when the directory is being changed. This fixes the regression in the fibonacci benchmark. sh_pwdupdate() will be called whenever sh.pwdfd changes to prevent file descriptor leaks while also avoiding erroneous close operations on the parent shell's file descriptor for the PWD.

Other changes applied include:

  • The removal of support for systems which lack fchdir(2). ksh93v- did this twelve years ago in the oldest archived alpha release, which I think is a more than long enough deprecation window.
  • If sh.pwdfd is -1 (i.e., the shell was initialized in a nonexistent directory), don't fork preemptively. It's more efficient to let cd check sp->pwdfd via a short function (sh_validate_subpwdfd()) before attempting to change the directory. This allows subshells run in nonexistent directories to avoid forking, provided cd is never invoked.
  • Removed <=1995 hackery from nv_open() and env_init() During SH_INIT we should simply always invoke path_pwd(); giving $PWD NV_TAGGED is not just unnecessary, but possibly harmful when one attempts to use typeset -t with the PWD variable.
  • With the removal of that code, sh.pwd is guaranteed to get set during shell initialization (if it wasn't already). As such, the code no longer needs to cope with a NULL pwd and many such checks have been removed.
  • sh.pwd no longer has a pointless and misleading const modifier. This allows for the elimination of many casts involving it.
  • sh.pwdfd is reset every time the pwd is (re)initialized in path_pwd().
  • libast/features/fcntl.c: Account for systems which don't implement O_SEARCH and/or O_DIRECTORY. (I avoided backporting the ksh93v- feature test code because it's ginormous and filled with far too much compatibility hackery. If anyone wants to backport the mess that are the libast syscall intercepts, be my guest. In any case ksh doesn't need any of that to function correctly.)
  • Goat sacrifices are hereby banned from 93u+m (for the curious, please see Ksh fails to start if current directory is not accessible att/ast#567).

With these changes ksh93u+m now handily beats the previous ksh releases in the fibonacci and parens benchmarks (ksh2020 included for comparison):

----------------------------------------------------------------------------------------------------------------------------------
name           /tmp/ksh93u+2012-08-01  /tmp/ksh93v-2014-12-24  /tmp/ksh2020            /tmp/ksh93u+m-dev       /tmp/ksh93u+m-patch
----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.230 [0.224-0.236]     0.247 [0.244-0.252]     0.369 [0.364-0.375]     0.217 [0.213-0.222]     0.202 [0.198-0.206]
parens.ksh     0.207 [0.203-0.213]     0.124 [0.121-0.127]     0.165 [0.162-0.169]     0.173 [0.171-0.176]     0.082 [0.080-0.085]
----------------------------------------------------------------------------------------------------------------------------------

In ksh93v- a sh.pwdfd variable was introduced, ostensibly for
implementing 'cd -f', 'pwd -f' and '${.sh.pwdfd}' as new features. I
might backport these in the future, but those aren't the focus of this
commit. Rather, what caught my interest was the effect of this change on
subshell performance (results from shbench with 20 iterations, compiled
with CCFLAGS='-O2 -fPIC'):

----------------------------------------------------------------------------------------------------------------------------------
name           /tmp/ksh93u+2012-08-01  /tmp/ksh93u+m-dev       /tmp/ksh93v-2012-08-24  /tmp/ksh93v-2014-12-24  /tmp/ksh93u+redhat
----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.230 [0.227-0.235]     0.216 [0.211-0.222]     0.269 [0.266-0.273]     0.247 [0.242-0.251]     0.264 [0.260-0.269]
parens.ksh     0.208 [0.204-0.212]     0.173 [0.170-0.182]     0.147 [0.145-0.149]     0.125 [0.121-0.127]     0.140 [0.137-0.144]
----------------------------------------------------------------------------------------------------------------------------------

As can be seen above, 93u+ and 93u+m perform worse in the parens
benchmark than the ksh93v- release which implements the change, as well
as a copy of ksh93u+ compiled with Red Hat's patches for cd (fdstatus,
rmdirfix, cdfix and cdfix2 applied in that order). However, 93u+ and
93u+m perform better in the fibonacci benchmark.

93v- is faster in the parens benchmark because it obtains the file
descriptor for the current working directory via openat(2) in
sh_diropenat, then caches it in sh.pwdfd. This fd is then reused in
sh_subshell(), which allows 93v- to avoid taxing open(2) and fcntl(2)
syscalls.

However, the 93v- implementation has a flaw. Unlike in 93u+, it always
duplicates sh.pwdfd with fcntl, even when it's a superfluous operation.
This is the cause of the performance regression in the fibonacci
benchmark.

To increase ksh93u+m's performance beyond both 93u+ and 93v-, I have
backported 93v-'s code, but with a crucial change:
- sh_subshell() no longer duplicates sh.pwdfd with fcntl during
  virtual subshell scope initialization. Rather, a sh_pwdupdate
  function is used in tandem with sh_diropenat in b_cd() to
  do all of this when the directory is being changed.
  This fixes the regression in the fibonacci benchmark.
  sh_pwdupdate will be called whenever sh.pwdfd changes to
  prevent file descriptor leaks while also avoiding erroneous close
  operations on the parent shell's file descriptor for the PWD.

Other changes applied include:
- The removal of support for systems which lack fchdir. ksh93v- did this
  twelve years ago in the oldest archived alpha release, which I think
  is a more than long enough deprecation window.
- If sh.pwdfd is -1 (i.e., the shell was initialized in a nonexistent
  directory), don't fork preemptively. It's more efficient to let cd
  check sp->pwdfd via a short function before attempting to change the
  directory. This allows subshells run in nonexistent directories to
  avoid forking, provided cd is never invoked.
- Removed <=1995 hackery from nv_open and env_init. During SH_INIT
  we should simply always invoke path_pwd(); giving $PWD NV_TAGGED is
  not just unnecessary, but possibly harmful when one attempts to use
  'typeset -t' with the PWD variable.
- With the removal of that code, sh.pwd is guaranteed to get set during
  shell initialization (if it wasn't already). As such, the code no
  longer needs to cope with a NULL pwd and many such checks have
  been removed.
- sh.pwd no longer has a pointless and misleading const modifier. This
  allows for the elimination of many casts involving sh.pwd.
- sh.pwdfd is reset every time the pwd is (re)initialized in
  path_pwd().
- libast/features/fcntl.c: Account for systems which don't implement
  O_SEARCH and/or O_DIRECTORY. (I avoided backporting the ksh93v-
  feature test code because it's ginormous and filled with far too
  much compatibility hackery. If anyone wants to backport the mess
  that are the libast syscall intercepts, be my guest. In any case
  ksh doesn't need any of that to function correctly.)
- Goat sacrifices are hereby banned from 93u+m (for the curious,
  please see att#567).

With these changes ksh93u+m now handily beats the previous ksh releases
in the fibonacci and parens benchmarks (ksh2020 included for comparison):

----------------------------------------------------------------------------------------------------------------------------------
name           /tmp/ksh93u+2012-08-01  /tmp/ksh93v-2014-12-24  /tmp/ksh2020            /tmp/ksh93u+m-dev       /tmp/ksh93u+m-patch
----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.230 [0.224-0.236]     0.247 [0.244-0.252]     0.369 [0.364-0.375]     0.217 [0.213-0.222]     0.202 [0.198-0.206]
parens.ksh     0.207 [0.203-0.213]     0.124 [0.121-0.127]     0.165 [0.162-0.169]     0.173 [0.171-0.176]     0.082 [0.080-0.085]
----------------------------------------------------------------------------------------------------------------------------------
JohnoKing added a commit to JohnoKing/ksh that referenced this pull request Dec 26, 2024
This commit refactors the local builtin to take advantage of some
changes I intend to submit later as separate pull requests. These are:

- np->nvflags has been expanded from 16-bit to 32-bit. This fixes
  the longstanding limitation detailed in a previous ksh2020 bug:
  att#1038.
  The fix consists of masking out nv_open-centric flags with
  explicit uses of the new NV_OPENMASK, rather than implicitly and
  obtusely abusing the limitations of unsigned short types.
  Here, I take advantage of it for storing NV_DYNAMIC in np->nvflags
  and using it from there, which is a safer and more simple choice
  long term.

- A bugfix from ksh93v- 2012-08-24 has been backported to fix
  a potential segfault that is triggered under ASan when changing
  the sizes of np->nvsize and np->nvflags to uint32_t. This cannot
  be triggered on the dev branch, but I will submit this individually
  soon enough. (I would have already submitted it had I not been
  forced to deal with a broken PSU shortly after submitting ksh93#805;
  0 out of 10 experience don't recommend).

- Updated comment regarding pitfalls in the ksh93v- and current
  implementation of local.

In any case I intend to separate out various fixes from this branch and
submit them as separate pull requests, if only to eventually reduce the
scope of this one to just the local builtin and nothing else (the
sh_funscope refactor for instance fixes an out of bounds bug that has
been on the dev branch for far too long). Unfortunately my life has
been quite hellish lately, so there is no ETA yet.
@McDutchie
Copy link

McDutchie commented Jan 3, 2025

Thanks for this PR, this is great. On macOS (12.7.6 M1/arm64), the performance improvement on the parens.ksh benchmark is quite spectacular. Columns are: 93u+m 2012-08-01; ksh2020; dev branch without this PR; dev branch with this PR.

---------------------------------------------------------------------------------------------------------------------------------------------
name           /bin/ksh             /usr/local/bin/ksh-2020.0.1  ../ksh/arch/darwin.arm64-64/bin/ksh.old  ../ksh/arch/darwin.arm64-64/bin/ksh
---------------------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.192 [0.191-0.193]  0.304 [0.300-0.306]          0.186 [0.184-0.187]                      0.170 [0.169-0.171]
parens.ksh     0.353 [0.351-0.355]  0.145 [0.144-0.147]          0.349 [0.342-0.386]                      0.081 [0.080-0.086]
---------------------------------------------------------------------------------------------------------------------------------------------

@McDutchie
Copy link

The PR looks good to me. I don't know if it's all going to work on the more obscure systems like QNX or AIX, but that is more easily tested after merging it.

@McDutchie McDutchie merged commit 3d7b7e1 into ksh93:dev Jan 3, 2025
McDutchie pushed a commit that referenced this pull request Jan 3, 2025
…#805)

In ksh93v- a sh.pwdfd variable was introduced, ostensibly for
implementing 'cd -f', 'pwd -f' and '${.sh.pwdfd}' as new features.
I might backport these in the future, but those aren't the focus of
this commit. Rather, what caught my interest was the effect of this
change on subshell performance (results from shbench[*] with 20
iterations, compiled with CCFLAGS='-O2 -fPIC'):

-------------------------------------------------------------
name           /tmp/ksh93u+2012-08-01  /tmp/ksh93u+m-dev
-------------------------------------------------------------
fibonacci.ksh  0.230 [0.227-0.235]     0.216 [0.21f1-0.222]
parens.ksh     0.208 [0.204-0.212]     0.173 [0.170-0.182]
-------------------------------------------------------------
name           /tmp/ksh93v-2012-08-24  /tmp/ksh93v-2014-12-24
-------------------------------------------------------------
fibonacci.ksh  0.269 [0.266-0.273]     0.247 [0.242-0.251]
parens.ksh     0.147 [0.145-0.149]     0.125 [0.121-0.127]
-------------------------------------------------------------
name           /tmp/ksh93u+redhat
-------------------------------------------------------------
fibonacci.ksh  0.264 [0.260-0.269]
parens.ksh     0.140 [0.137-0.144]
-------------------------------------------------------------

As can be seen above, 93u+ and 93u+m perform worse in the parens
benchmark than the ksh93v- release which implements the change, as
well as a copy of ksh93u+ compiled with Red Hat's patches for cd
(fdstatus, rmdirfix, cdfix and cdfix2 applied in that order).
However, 93u+ and 93u+m perform better in the fibonacci benchmark.

93v- is faster in the parens benchmark because it obtains the file
descriptor for the current working directory via openat(2) in
sh_diropenat, then caches it in sh.pwdfd. This fd is then reused in
sh_subshell(), which allows 93v- to avoid taxing open(2) and
fcntl(2) syscalls.

However, the 93v- implementation has a flaw. Unlike in 93u+, it
always duplicates sh.pwdfd with fcntl, even when it's a superfluous
operation. This is the cause of the performance regression in the
fibonacci benchmark.

To increase ksh93u+m's performance beyond both 93u+ and 93v-, I
have backported 93v-'s code, but with a crucial change:
- sh_subshell() no longer duplicates sh.pwdfd with fcntl during
  virtual subshell scope initialization. Rather, a sh_pwdupdate
  function is used in tandem with sh_diropenat in b_cd() to do all
  of this when the directory is being changed. This fixes the
  regression in the fibonacci benchmark. sh_pwdupdate will be
  called whenever sh.pwdfd changes to prevent file descriptor leaks
  while also avoiding erroneous close operations on the parent
  shell's file descriptor for the PWD.

Other changes applied include:
- The removal of support for systems which lack fchdir. ksh93v- did
  this twelve years ago in the oldest archived alpha release, which
  I think is a more than long enough deprecation window.
- If sh.pwdfd is -1 (i.e., the shell was initialized in a
  nonexistent directory), don't fork preemptively. It's more
  efficient to let cd check sp->pwdfd via a short function before
  attempting to change the directory. This allows subshells run in
  nonexistent directories to avoid forking, provided cd is never
  invoked.
- Removed <=1995 hackery from nv_open and env_init. During SH_INIT
  we should simply always invoke path_pwd(); giving $PWD NV_TAGGED
  is not just unnecessary, but possibly harmful when one attempts
  to use 'typeset -t' with the PWD variable.
- With the removal of that code, sh.pwd is guaranteed to get set
  during shell initialization (if it wasn't already). As such, the
  code no longer needs to cope with a NULL pwd and many such checks
  have been removed.
- sh.pwd no longer has a pointless and misleading const modifier.
  This allows for the elimination of many casts involving sh.pwd.
- sh.pwdfd is reset every time the pwd is (re)initialized in
  path_pwd().
- libast/features/fcntl.c: Account for systems which don't
  implement O_SEARCH and/or O_DIRECTORY. (I avoided backporting the
  ksh93v- feature test code because it's ginormous and filled with
  far too much compatibility hackery. If anyone wants to backport
  the mess that are the libast syscall intercepts, be my guest. In
  any case ksh doesn't need any of that to function correctly.)
- Goat sacrifices are hereby banned from 93u+m (for the curious,
  please see att#567).

With these changes ksh93u+m now handily beats the previous ksh
releases in the fibonacci and parens benchmarks (ksh2020 included
for comparison):

-------------------------------------------------------------
name           /tmp/ksh93u+2012-08-01  /tmp/ksh93v-2014-12-24
-------------------------------------------------------------
fibonacci.ksh  0.230 [0.224-0.236]     0.247 [0.244-0.252]
parens.ksh     0.207 [0.203-0.213]     0.124 [0.121-0.127]
-------------------------------------------------------------
name           /tmp/ksh2020            /tmp/ksh93u+m-dev
-------------------------------------------------------------
fibonacci.ksh  0.369 [0.364-0.375]     0.217 [0.213-0.222]
parens.ksh     0.165 [0.162-0.169]     0.173 [0.171-0.176]
-------------------------------------------------------------
name           /tmp/ksh93u+m-patch
-------------------------------------------------------------
fibonacci.ksh  0.202 [0.198-0.206]
parens.ksh     0.082 [0.080-0.085]
-------------------------------------------------------------

[*] shbench: http://fossil.0branch.com/csb
@JohnoKing JohnoKing deleted the faster-pwd-restore branch January 4, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants