-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poll/close error storm after adding just one more to fanout (for me 496) #123
Comments
Nice debugging! No, it isn't wrong to have a large fanout AFAIK, and I have no idea what could be going on here at the moment. Can you include output of |
it's just master from last night on a mostly updated debian unstable |
|
libc is from libc6_2.31-0experimental0_amd64.deb interesting, it does not reproduce on an updated ubuntu16 |
Here's what I suspect is happening. In
I wonder if, after creating a certain number of threads, the OS returns a huge number here and the children spend all this time closing fds. Try this patch as an experiment and see if it helps: diff --git a/src/common/pipecmd.c b/src/common/pipecmd.c
index 904725c..1ea7f1b 100644
--- a/src/common/pipecmd.c
+++ b/src/common/pipecmd.c
@@ -252,7 +252,7 @@ const char * pipecmd_target (pipecmd_t p)
static void closeall (int fd)
{
- int fdlimit = sysconf (_SC_OPEN_MAX);
+ int fdlimit = 1024;
while (fd < fdlimit)
close (fd++); (Be sure to do |
I had already suspected that, and confirmed the value is 1024 already on my system. and also tried hardcoding. it had no effect unfortunately, I don't think it is the issue. |
Are you sure that I added a print of the current value for grondo@asp:~/git/pdsh$ src/pdsh/pdsh -R exec -f 495 -w foo[0-500] true 2>&1 | tail -1
foo492: sysconf (_SC_OPEN_MAX) = 1024
grondo@asp:~/git/pdsh$ src/pdsh/pdsh -R exec -f 496 -w foo[0-500] true 2>&1 | tail -1
foo336: sysconf (_SC_OPEN_MAX) = 1048576 |
interesting, it does happen on ubuntu18, just another data point. I thought I had done a distclean but will reconfirm |
you are right, I had relied on the dependencies, without a distclean. a forced rebuild and it's fixed. so I found that last night and chased it a while after concluding that was not it, oh well ;-) FWIW, I had looked into it and thought best way is just ifdef linux, enumerate FDs in /proc/self/fd/ and close them that way, seems ugly but probably fine, other platforms likely have sane limit for that config. I don't get why the value is high though since a test program it's 1024 on my system, as well as using |
I was able to reproduce on ubuntu 18.04 VM. Setting an upper limit on |
so this is the same system that exhibits the issue, it has 1024, I don't get it
|
Pdsh creates a socketpair for each subcommand it runs. With fanout large enough, more than 1024 fds will be opened at a time and the soft limit will get bumped to the hard limit ( |
Yes, this could work, thanks. If you are able, feel free to submit a PR. If not I will get to it later in the week or perhaps early next week. BTW, thanks for the great find! (and you would have had the solution too, if not for the broken pdsh build system!) |
ok, see now, ok that makes sense. great! sorry didn't catch the rebuild would have saved us both time! well I think enumerate in /proc/self/fd/ on linux is best but that's a pain to be doing i/o just to get fds seems silly. probably use min(1024, foo) on it I guess, don't see if it could possibly get this large anyways in practice |
btw this means it's preallocating according to fanout, even if actual target list is small. that means setting |
Ah, yes, you are right. The nofile limit is increased automatically in int nfds = (2 * opt->fanout) + 32; This should probably be: int nfds = (2 * MIN(opt->fanout, hostlist_count (opt->wcoll))) + 32; |
BTW, in case you take a look at a fix, in another project we used https://github.com/flux-framework/flux-core/blob/master/src/common/libutil/fdwalk.c I think the license would be compatible to bring it in to pdsh as well. (It already has a fallback for non-linux systems) |
is closing fds needed at all if internally opened fds are set close on exec? (there's a common fd function to set it on already opened ones, although I can't find anywhere it's used). naively, could the program maybe just set close-on-exec and re-execute itself on start before even doing anything? |
the fdwalk solution seems perfect, not sure I like using SYS_getdents() directly but it's already ifdef linux so same difference. I can try to create a PR at some point, workaround (lower fanout) is sane for the time being. thanks a lot for your help, been using pdsh for many years. still the best for a lot of use cases! |
some time ago, pdsh became very slow for me when connecting to even just 50-100 machines, running something simple like /bin/true. years ago this would take much less than a second (with persistent ssh connections). now it's taking quite a while, more than ten seconds to do this.
even using
-R exec
runningecho
, the following takes 12 seconds to run on just 57 hosts, it's not even logging in anywhere so not sure what it could be doing:on a dual-core Intel 8th Gen 16GB RAM:
built using
--without-rsh --without-ssh --with-exec --with-genders
. would normally want to use ssh with this, just used exec for testing, but would think that should be instant. this used to be very much faster. trying to run this under strace, it takes about 10 minutes:dang, seems to want to close something an awful lot. the fd numbers it's trying to close get out of hand rapidly:
thinking this would need backtrace, wanted to get one thread running, so tried
FANOUT=1
... but then PROBLEM GONE!!! found the transition point:ok so is it wrong to have high fanout? what transition occurs at 496? isn't this number just a limit? shouldn't the fanout be able to work to arbitrary extent if the machine has enough resources to handle it? maybe we have 10k machines in the cluster and we got a big controller to handle them in one batch (our cluster's not that large, but maybe next year ;-)
originally had tried to find a
FANOUT=0
option to mean "as many threads as requested for all nodes to be done in one batch," but had just usedFANOUT=9999
since that doesn't seem possible to specify. some time ago this was set innocuously, apparently this causes a problem making pdsh slow, which has been bugging me for some time but didn't see the link between the two.The text was updated successfully, but these errors were encountered: