Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Support different connection options for GWs than for end points #466

Open
damianam opened this issue Dec 27, 2021 · 11 comments
Open

Comments

@damianam
Copy link

It could be useful to support different connection options for GWs than for end points.

Example: large clusters with GWs, where the hierarchical setup is beneficial, but where one does not want to store private SSH keys on the GWs, or enable agent forwarding all the way to the compute nodes (just to the GWs).

Currently you need to relax your security by enabling agent forwarding everywhere, or have private keys in the GWs unnecessarily.

I am aware of the possibility of disabling agent forwarding in sshd_config, but this is an all or nothing switch that would affect all users that have access to the nodes. Since in most cases there is no practical use of enabling agent forwarding on the computes when running clush commands, I think separating the options used for the GWs from the options used for the computes, is a reasonable use case.

@martinetd
Copy link
Collaborator

You can just use different configs in your client's ssh_config using Host blocks? it's probably more practical than whatever configuration scheme we can come up with.
It would however apply to non-clush uses of the hostnames as well unless you use different node names to access them through clush.

(Note that for small clusters if you don't want to use ssh agent or put private keys on gateways you could just not use the gateway code and use gateways through the ProxyJump option -- that will use the initial client's agent/keys for both gateway and final connection at the cost of opening a distinct connection to the GW for each node, so it's not practical for large clusters, only for hard-to-access nodes. Enabling session multiplexing might make that slightly more resource efficient but can hang the first ssh through that node if later ones take time to finish and I wouldn't recommend that... Unfortunately to benefit from GW connection multiplexing there really is no other choice than giving GWs a private key or forwarding agent to GW - it might be possible to implement something similar to proxyjump by having a multiplexing ssh or port forwarding run as a service in the background but I feel it's out of scope for clush)

@damianam
Copy link
Author

damianam commented Jan 3, 2022

The problems with changing the ssh_config as a work around are, in my view:

  • Affects individual ssh connections when not using clush, in a use case where you don't really need that
  • It still ends up in having a few thousand connections from the master node. You are just jumping through an extra host, but it won't behave hierarchically, which is the main thing we are after.

SSH session multiplexing looks interesting, I did not know that feature, but seems indeed that it is not a silver bullet.

The way I imagined this feature request is simply by having an extra section in the configuration or a new option to pass options just to GW nodes. So we have -q for "general" SSH options, but we could have -qgw or whatever for SSH options for GW nodes. Similarly we have [Main] in clush.conf, but we could have also [Gateways] as section, and tweak there the ssh_options. Naïvely, it looks feasible. But of course the devil is in the details.

@martinetd
Copy link
Collaborator

Well, if the point is just only enabling ssh agent on GWs, having an appropriate section that only enables it for gws seems good enough for me (yes it'll be enabled even when you ssh there without clush but they're gw so that looks like a valid usage for me).
If you really care about that, you can also have a different ssh config file just for clush with -o '-F ~/.clush_ssh_config' (note in this case /etc/ssh/config is also ignored)

I just don't think it's worth adding the complexity of only applying some options to gw in this case, but other developers might be more open than me here - I'll defer to them :)

I agree multiplexing isn't good enough for anything bigger than small clusters, there's a limit to how many multiplexed connections ssh will allow and it's just not a good 1-1 mapping to clush gw (there's a reason we implemented this in the first place!), I just brought it up because there really is no other way of skipping agent forwarding or similar (kerberos credentials forwarding?) as that was what you seemed to care about.

@damianam
Copy link
Author

I am not sure you can apply the suggested workaround (enabling forwarding for the GWs in /etc/ssh_config). In fact, I started doing it, just to realize that clush is actively disabling agent forwarding here: https://github.com/cea-hpc/clustershell/blob/master/lib/ClusterShell/Worker/Ssh.py#L61 Because clush adds that line to the ssh CLI, whatever you specify in your /etc/ssh_config is not respected, since the ssh CLI has precedence.

You can override the default that clush sets with ssh_options, but if you do override it, then it is applied to all the connections, not just to the connections to the GWs. In other words, the agent will be forwarded all the way to the compute nodes.

So at the end you can't selectively enable agent forwarding. You either forward it for all the connections, or you disable it (default) for all the connections.

I suppose that the security implications of enabling it blindly is the reason why clush disables it by default (which is good). But this situation forces admins to choose between not having hierarchical support, creating private keys in the system, or enabling forwarding everywhere. I wouldn't like to relax security to have hierarchical support. And having hierarchical clush is an awesome feature, so missing it is a real pity.

@damianam
Copy link
Author

damianam commented Feb 4, 2022

So this issue kind of stalled. Is there something on our side we can do to push it forward? Would a PR be appreciated or is that something that the current team would not like?

@martinetd
Copy link
Collaborator

Sorry I had missed the previous reply.

  • re: actively disabling agent forwarding, I'm not sure why we do that, probably performance when we weren't expecting for the remote to make new connections back before the gw code. I think that should just be disabled for at least connections to gw which could use it, or made an option.
    Honestly this shouldn't be hardcoded like this, I'd understand if it was the default for ssh_options but this is just annoying; it can definitely be changed.
  • re: PR, (note I cannot merge PRs), I personally just don't see how to make such an option easily discoverable/useable which is why I pushed it back to ssh with my alias suggestion, but if you have a solid idea in mind it's probably ok. I'd suggest just getting the syntax/doc or example first and grabbing @Sthiell's attention when you have something concrete to iterate on.

@damianam
Copy link
Author

damianam commented Feb 8, 2022

Thanks for the answer. I think that if we can get rid of that hardcoded option, then we could implement the work around with ssh_config that you suggested. But removing that hardcoded flag is probably not as easy as simply deleting the line, since that would imply a different behaviour in the new version. Having input from other developers here would be interesting. Probably the cases where this is relevant are negligible or not worth considering.

On the other hand, ssh_config tweaking is IMO a workaround for a feature that is interesting to have in its own right as part of clush. I see two possible ways to implement it:

  • Override based. So we keep -o for the CLI and all the options in the [Main] section in clush.conf, but we add a [Gateway] section and a -go (gateway options) for the CLI that override the general options if it exists. If they are not specified, then the general one is used. This is backwards compatible. This has the drawback that some options might be specified in the [Gateway] section, which might not make sense (like color or verbosity for instance). The CLI works more predictable, since -o is just for ssh options.
  • Separate options. So we have -o and -go like in the other case. But the connections to the gateways do not take into consideration -o at all. For clush.conf we would have [Main] for most options, and [Gateway] and [Clients] (or something like that) just for ssh options. Both [Clients] and [Gateway] are mandatory to specify, since the ssh options would be moved into these sections. I think this option is more explicit, but no backwards compatible, so probably the first one makes more sense?

Regardless of that, how to deal with the hardcoded option remains open. One could think of moving it to clush.conf as default, so it is less sneaky and people can tweak it as necessary.

@damianam
Copy link
Author

damianam commented Mar 1, 2022

Any feedback regarding how this could look like?

@degremont
Copy link
Collaborator

I agree we need to have a way for users to have different options for their gateways from the options use to connect to leaf nodes (clients).

Either they could do it with customer ssh_config but this does not work for ForwardAgent as it is hardcoded on CLI.
Or they could overload it with clush.conf but in that case this is not possible to have different option for GW and clients.

I don't like the second proposal as this specific case is not needed by most of our customers and i don't think adding a specific section will be useful.

  1. Either we remove the hardcoded value (i'm not against) from the code and we let customer use ssh_config.
  2. Or we had new options to clush.conf to specific ssh options only for gateways. That probably means we will have to support every ssh_* options but also options for other Workers.

@damianam
Copy link
Author

damianam commented Mar 4, 2022

IMO the 2n point is more elegant, but the 1st is more practical (less effort and risk of it stalling over a long period of time, since the implementation effort is larger)

@damianam
Copy link
Author

I'd like to make progress here, since it is likely that it will take some time until distros pick it up, etc. Would making a PR be welcomed? If so we could make one following the 1st approach, but making it opt-in via a config option (so the new release behaves 100% like the previous ones). Does this sound reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants