Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topology: *!compute as root nodes #553

Closed
skwde opened this issue Jan 31, 2024 · 10 comments
Closed

topology: *!compute as root nodes #553

skwde opened this issue Jan 31, 2024 · 10 comments

Comments

@skwde
Copy link

skwde commented Jan 31, 2024

To access our compute nodes (compute), we can go through a number of hosts (head) accessible from all hosts (except compute) in our network.

I tried to model that in topology.conf but non of my attempts worked.

Initial config

Having only

head:compute
# @head:@compute

leads to an error if I don't run the clush command from a machine in head.
I.e. I get

clush: TREE MODE: "<local machine>" is not a valid root node!

If I specify the root node as the node I am currently running the command on

<local machine>:head
head:compute

This works but is inconvenient because if I am on another machine than <local machine> clush stops working with above error.

I want to specify a group of machines for <local machine>.

Extending root nodes to all nodes

In all my attempts I see the following problems

  1. compute is not ignored from the root nodes (seen via clush -d)
  2. I get the error clush: TREE MODE: Invalid root or gateway node: *,<other nodes expect compute>

I also tried to define a group in *.yaml and use that but also with no success.

All of my problems above might be related to my lack of understanding about where and how groups / group sources can be used in the various config files (topology.conf, groups.d/*.yaml).
Unfortunately this is not clear to me from the docs.

Can you please clarify

  • how / where we are supposed to use groups (with extended string patterns) in config files?
    • I have the impression, that @ is not always required to access a group, and should be even dropped in some cases?!
    • Moreover I saw extended string patterns using \& instead of just &, is the escaping only required in the shell when not properly quoting?
  • use all!compute as valid root machines in topology.conf
@degremont
Copy link
Collaborator

degremont commented Jan 31, 2024 via email

@skwde
Copy link
Author

skwde commented Feb 1, 2024

Yes, you can see it like that.

Does it make a difference if I specify a group source instead of a group?
I.e. Whenever I specify a group, I have to stick to a pattern appearing in the first column of nodeset -LL, right?
@* is derived from the default group source, is there a way to address all machines clustershell knows about?

Unfortunately it is still not working.

Here is my topology.conf

[routes]
@*!@cluster:gateway!@cluster:compute: @cluster:gateway
@cluster:gateway: @cluster:compute

and this is how mygroups.d/cluster.yaml looks like

default:
    node: '@cluster:compute'
    head: '@cluster:head'
    login: '@cluster:login'
    other: '<many other nodes>'

login:
    login: 'login[01,02]'
    test: login-test

head:
    head: 'head'

compute:
    cpu: 'node[001-010]'

cluster:
    gateway: '@login:*,@head:*'
    login: '@login:*'
    head: '@head:*'
    compute: '@compute:*'

If I now run

clush -d -v -w node010 -b 'hostname'

I see

---------------
head02,login[01-02],login-test,node[001-056,301],<other nodes>
`- gateway
---------------

I guess the problem is caused by : in topology.conf, because when I use nodeset -f '<topology group>' it shows me the correct hosts. I also tried quoting the groups in topology.conf but then I get again clush: TREE MODE: "<local machine>" is not a valid root node!

EDIT:
For completness there is also this issue #420

@thiell
Copy link
Collaborator

thiell commented Feb 1, 2024

@skwde You're right, the groupsource:groupname colon (:) separator doesn't seem to be supported in topology.conf because the parser is using Python's ConfigParser ( https://stackoverflow.com/questions/17947319/python-configparser-with-colon-in-the-key ). Even escaping : does not work. Also, unfortunately the actual parsing error is not properly reported by clush in that case. We will need to find a solution for this limitation (and improve error reporting). Perhaps it is time to use a yaml config file for the topology file too.

A possible workaround is to use groups in topology.conf with no explicit group sources, that is, that are defined in the default group source. Note that they can potentially be based on groups from another source (and it looks like you were going to try that), for example:

groups.conf:

default: default

cluster.yaml:

default:
    node: '@cluster:compute'
    head: '@cluster:head'
    login: '@cluster:login'
    ...

topology.conf:

[routes]
@*!@head!@login!@node = @head,@login
@head,@login = @node
clush -d -v -w node010 -d 'hostname'
---------------
mynode
`- head,login[01-02],login-test
   `- node[001-010]

This is an example to illustrate only, not sure this is exactly what you want.

@skwde
Copy link
Author

skwde commented Feb 1, 2024

Perfect, it is working now. Thanks!

For completeness I had to set

export CLUSTERSHELL_GW_PYTHON_EXECUTABLE=<path to python executable>

I still have some conceptual questions:

  • How does clustershell choose which gateway to use in above setup?
  • Why is it not possible to set CLUSTERSHELL_GW_PYTHON_EXECUTABLE in clush.conf?
  • In principle nodeset -LL mentions the groups I am allowed to use?
  • Is there a way to have groupsources interpreted as groups, i.e. a shorthand for @groupsource:*?
  • Is there a way to list all hosts clustershell knows about, possibly also from a single config file (@* refers to the default all)?

@degremont
Copy link
Collaborator

degremont commented Feb 1, 2024

In principle nodeset -LL mentions the groups I am allowed to use?

Yes.

nodeset --groupsources
list all available sources, and the default one

  • by default, group name without an explicit groupsource name, will use the default one, ie: @mygroup
  • you can always explicitly specify a groupsource for your name. ie @local:mygroup
  • -s option can change the current default group source

Is there a way to have groupsources interpreted as groups, i.e. a shorthand for @groupsource:*?

There are ways to interpret the list of group from a groupsource as a unique group:

ClusterShell 1.9 introduces a new operator @@ optionally followed by a source name (e.g. @@source) to access the list of raw group names of the source (without the @ prefix). If no source is specified (as in just @@), the default group source is used (see groups.conf). The @@ operator may be used in any node set expression to manipulate group names as a node set.

Review https://clustershell.readthedocs.io/en/latest/tools/nodeset.html#listing-group-names-in-expressions

But not the list of sources themselves, if this is what you want. You can style built your own sources based on that.

/etc/clustershell/groups.conf.d/groupsources.conf

[groupsources]
map: nodeset -f @@$GROUP
list: nodeset --groupsources | awk '$1 !~ "groupsources" {print $1}'

(I filtered out "groupsources" but you can keep it if you think it makes sense)

I'll let you play with that, depending on what you want to achieve

Is there a way to list all hosts clustershell knows about, possibly also from a single config file

Isn't nodeset -LL giving you that already?

(@* refers to the default all)?

Yes. This is from the default source. @othersource:* for another source.

@skwde
Copy link
Author

skwde commented Feb 2, 2024

Thanks a lot for your elaborate answers! They are indeed helpful!

Is there a way to list all hosts clustershell knows about, possibly also from a single config file

Isn't nodeset -LL giving you that already?

Yes and No.
Say I want to run clush on all machines known by clustershell, I have to use something like

clush -w $(nodeset -LL | awk '{print $1}' | nodeset -f) 'cmd'

It would be good to have a operator for that.

Besides, to my knowledge it is not possible to do above for just a single (YAML) file defining group sources.

@degremont
Copy link
Collaborator

This is a more specific use case. Usually people are using different group sources for either managing different kind of hardware (compute, switches, racks, etc...) and they don't really want to run the same command on all of them.
Or they use sources for different "view" of the same nodes (roles, slurm jobs, states, ...) where there is little interest of querying all groups have they are managing the same node list.

clush -w $(nodeset -LL | awk '{print $1}' | nodeset -f) 'cmd'

That group query could be simplified:

  • nodeset -LL: is giving you the group name, and group content
  • awk: then you are parsing to extract only the group name

nodeset -L is already giving you only the group list, no need for an awk.

I would recommend crafting your own specific group sources. One of the nice feature of ClusterShell is that possibility to easily declare your own source based on your exact needs, using shell commands.
What would be the purpose of that source? Only a way to get all nodes from all sources? Is there another reason to query all sources at the same time?

I put an example below based on the constrains I understood from your explanation.
See also other examples: https://clustershell.readthedocs.io/en/latest/config.html#group-external-sources

/etc/clustershell/groups.conf.d/all.conf

[all]
map: echo "@"$GROUP
list: nodeset --groupsources | awk '$1 !~ "all" {print $1}' | xargs -i -n1 nodeset -s {} -l | sed 's/@//'
all: nodeset --groupsources | awk '$1 !~ "all" {print $1}' | xargs -i -n1 echo "@{}:*"

Then

nodeset -f '@all:*'
clush -a -s all ...
clush -w '@all:*' ...

@skwde
Copy link
Author

skwde commented Feb 5, 2024

Thanks a lot for your explanation and example.

Is there a way to get all nodes defined in a single source (YAML) without defining a specific group source in the file?

@degremont
Copy link
Collaborator

Is there a way to get all nodes defined in a single source (YAML) without defining a specific group source in the file?

(I would have not made this complex example if something like that would already exist ;) )

A YAML file is not a single source, but a way to declare between 1 and multiple sources. The CLI does not have the knowledge of in which config file the source was declared.

If there is a behavior that you want and does not exist, do not hesitate to create your own source!
Here is, a source that extract groups from a specific YAML only:

[all]
map: yq '.[].$GROUP // ""' < /etc/clustershell/groups.d/<MYCONFIG.YAML>
list: yq '.[][] | key' < /etc/clustershell/groups.d/<MYCONFIG.YAML>
all: yq '.[][]' < /etc/clustershell/groups.d/<MYCONFIG.YAML>

@skwde
Copy link
Author

skwde commented Feb 7, 2024

Alright thanks again.

I am closing this now because we anyway deviated quite a bit from what I originally asked!

Keep up the great work!

@skwde skwde closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants