topology: `*!compute` as root nodes #553

skwde · 2024-01-31T06:58:16Z

To access our compute nodes (compute), we can go through a number of hosts (head) accessible from all hosts (except compute) in our network.

I tried to model that in topology.conf but non of my attempts worked.

Initial config

Having only

head:compute
# @head:@compute

leads to an error if I don't run the clush command from a machine in head.
I.e. I get

clush: TREE MODE: "<local machine>" is not a valid root node!

If I specify the root node as the node I am currently running the command on

<local machine>:head
head:compute

This works but is inconvenient because if I am on another machine than <local machine> clush stops working with above error.

I want to specify a group of machines for <local machine>.

Extending root nodes to all nodes

In all my attempts I see the following problems

compute is not ignored from the root nodes (seen via clush -d)
I get the error clush: TREE MODE: Invalid root or gateway node: *,<other nodes expect compute>

I also tried to define a group in *.yaml and use that but also with no success.

All of my problems above might be related to my lack of understanding about where and how groups / group sources can be used in the various config files (topology.conf, groups.d/*.yaml).
Unfortunately this is not clear to me from the docs.

Can you please clarify

how / where we are supposed to use groups (with extended string patterns) in config files?
- I have the impression, that @ is not always required to access a group, and should be even dropped in some cases?!
- Moreover I saw extended string patterns using \& instead of just &, is the escaping only required in the shell when not properly quoting?
use all!compute as valid root machines in topology.conf

The text was updated successfully, but these errors were encountered:

degremont · 2024-01-31T14:05:42Z

If I understand you correctly, you have the following setup: (most nodes: login, services, mgmt, ...) -> gateways (head nodes) -> compute In that case, the head nodes are what ClusterShell names Gateway, and all other are potential root nodes. Your config was almost good. Regarding the group usage. There are very few places where CS expects a group name and nothing else. This is the case mostly for group definition, where you must specify a group name, then the group content, which is a node pattern. Everytime you must list a list of nodes, you can use a node pattern. Topology.conf wants node list. The right to do it for you is: @*!@heads!@compute: @heads @heads: @compute Or you can list the root nodes explicitly, if it is simpler. Escaping is not needed, as this is not parsed by shell.

skwde · 2024-02-01T04:48:00Z

Yes, you can see it like that.

Does it make a difference if I specify a group source instead of a group?
I.e. Whenever I specify a group, I have to stick to a pattern appearing in the first column of nodeset -LL, right?
@* is derived from the default group source, is there a way to address all machines clustershell knows about?

Unfortunately it is still not working.

Here is my topology.conf

[routes]
@*!@cluster:gateway!@cluster:compute: @cluster:gateway
@cluster:gateway: @cluster:compute

and this is how mygroups.d/cluster.yaml looks like

default:
    node: '@cluster:compute'
    head: '@cluster:head'
    login: '@cluster:login'
    other: '<many other nodes>'

login:
    login: 'login[01,02]'
    test: login-test

head:
    head: 'head'

compute:
    cpu: 'node[001-010]'

cluster:
    gateway: '@login:*,@head:*'
    login: '@login:*'
    head: '@head:*'
    compute: '@compute:*'

If I now run

clush -d -v -w node010 -b 'hostname'

I see

---------------
head02,login[01-02],login-test,node[001-056,301],<other nodes>
`- gateway
---------------

I guess the problem is caused by : in topology.conf, because when I use nodeset -f '<topology group>' it shows me the correct hosts. I also tried quoting the groups in topology.conf but then I get again clush: TREE MODE: "<local machine>" is not a valid root node!

EDIT:
For completness there is also this issue #420

thiell · 2024-02-01T06:58:32Z

@skwde You're right, the groupsource:groupname colon (:) separator doesn't seem to be supported in topology.conf because the parser is using Python's ConfigParser ( https://stackoverflow.com/questions/17947319/python-configparser-with-colon-in-the-key ). Even escaping : does not work. Also, unfortunately the actual parsing error is not properly reported by clush in that case. We will need to find a solution for this limitation (and improve error reporting). Perhaps it is time to use a yaml config file for the topology file too.

A possible workaround is to use groups in topology.conf with no explicit group sources, that is, that are defined in the default group source. Note that they can potentially be based on groups from another source (and it looks like you were going to try that), for example:

groups.conf:

default: default

cluster.yaml:

default:
    node: '@cluster:compute'
    head: '@cluster:head'
    login: '@cluster:login'
    ...

topology.conf:

[routes]
@*!@head!@login!@node = @head,@login
@head,@login = @node

clush -d -v -w node010 -d 'hostname'
---------------
mynode
`- head,login[01-02],login-test
   `- node[001-010]

This is an example to illustrate only, not sure this is exactly what you want.

skwde · 2024-02-01T12:26:41Z

Perfect, it is working now. Thanks!

For completeness I had to set

export CLUSTERSHELL_GW_PYTHON_EXECUTABLE=<path to python executable>

I still have some conceptual questions:

How does clustershell choose which gateway to use in above setup?
Why is it not possible to set CLUSTERSHELL_GW_PYTHON_EXECUTABLE in clush.conf?
In principle nodeset -LL mentions the groups I am allowed to use?
Is there a way to have groupsources interpreted as groups, i.e. a shorthand for @groupsource:*?
Is there a way to list all hosts clustershell knows about, possibly also from a single config file (@* refers to the default all)?

degremont · 2024-02-01T12:49:59Z

In principle nodeset -LL mentions the groups I am allowed to use?

Yes.

nodeset --groupsources
list all available sources, and the default one

by default, group name without an explicit groupsource name, will use the default one, ie: @mygroup
you can always explicitly specify a groupsource for your name. ie @local:mygroup
-s option can change the current default group source

Is there a way to have groupsources interpreted as groups, i.e. a shorthand for @groupsource:*?

There are ways to interpret the list of group from a groupsource as a unique group:

ClusterShell 1.9 introduces a new operator @@ optionally followed by a source name (e.g. @@source) to access the list of raw group names of the source (without the @ prefix). If no source is specified (as in just @@), the default group source is used (see groups.conf). The @@ operator may be used in any node set expression to manipulate group names as a node set.

Review https://clustershell.readthedocs.io/en/latest/tools/nodeset.html#listing-group-names-in-expressions

But not the list of sources themselves, if this is what you want. You can style built your own sources based on that.

/etc/clustershell/groups.conf.d/groupsources.conf

[groupsources]
map: nodeset -f @@$GROUP
list: nodeset --groupsources | awk '$1 !~ "groupsources" {print $1}'

(I filtered out "groupsources" but you can keep it if you think it makes sense)

I'll let you play with that, depending on what you want to achieve

Is there a way to list all hosts clustershell knows about, possibly also from a single config file

Isn't nodeset -LL giving you that already?

(@* refers to the default all)?

Yes. This is from the default source. @othersource:* for another source.

skwde · 2024-02-02T05:22:49Z

Thanks a lot for your elaborate answers! They are indeed helpful!

Is there a way to list all hosts clustershell knows about, possibly also from a single config file

Isn't nodeset -LL giving you that already?

Yes and No.
Say I want to run clush on all machines known by clustershell, I have to use something like

clush -w $(nodeset -LL | awk '{print $1}' | nodeset -f) 'cmd'

It would be good to have a operator for that.

Besides, to my knowledge it is not possible to do above for just a single (YAML) file defining group sources.

degremont · 2024-02-02T08:47:27Z

This is a more specific use case. Usually people are using different group sources for either managing different kind of hardware (compute, switches, racks, etc...) and they don't really want to run the same command on all of them.
Or they use sources for different "view" of the same nodes (roles, slurm jobs, states, ...) where there is little interest of querying all groups have they are managing the same node list.

clush -w $(nodeset -LL | awk '{print $1}' | nodeset -f) 'cmd'

That group query could be simplified:

nodeset -LL: is giving you the group name, and group content
awk: then you are parsing to extract only the group name

nodeset -L is already giving you only the group list, no need for an awk.

I would recommend crafting your own specific group sources. One of the nice feature of ClusterShell is that possibility to easily declare your own source based on your exact needs, using shell commands.
What would be the purpose of that source? Only a way to get all nodes from all sources? Is there another reason to query all sources at the same time?

I put an example below based on the constrains I understood from your explanation.
See also other examples: https://clustershell.readthedocs.io/en/latest/config.html#group-external-sources

/etc/clustershell/groups.conf.d/all.conf

[all]
map: echo "@"$GROUP
list: nodeset --groupsources | awk '$1 !~ "all" {print $1}' | xargs -i -n1 nodeset -s {} -l | sed 's/@//'
all: nodeset --groupsources | awk '$1 !~ "all" {print $1}' | xargs -i -n1 echo "@{}:*"

Then

nodeset -f '@all:*'
clush -a -s all ...
clush -w '@all:*' ...

skwde · 2024-02-05T11:07:22Z

Thanks a lot for your explanation and example.

Is there a way to get all nodes defined in a single source (YAML) without defining a specific group source in the file?

degremont · 2024-02-05T11:24:54Z

Is there a way to get all nodes defined in a single source (YAML) without defining a specific group source in the file?

(I would have not made this complex example if something like that would already exist ;) )

A YAML file is not a single source, but a way to declare between 1 and multiple sources. The CLI does not have the knowledge of in which config file the source was declared.

If there is a behavior that you want and does not exist, do not hesitate to create your own source!
Here is, a source that extract groups from a specific YAML only:

[all]
map: yq '.[].$GROUP // ""' < /etc/clustershell/groups.d/<MYCONFIG.YAML>
list: yq '.[][] | key' < /etc/clustershell/groups.d/<MYCONFIG.YAML>
all: yq '.[][]' < /etc/clustershell/groups.d/<MYCONFIG.YAML>

skwde · 2024-02-07T04:49:29Z

Alright thanks again.

I am closing this now because we anyway deviated quite a bit from what I originally asked!

Keep up the great work!

skwde closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topology: `*!compute` as root nodes #553

topology: `*!compute` as root nodes #553

skwde commented Jan 31, 2024

degremont commented Jan 31, 2024 via email •

edited

Loading

skwde commented Feb 1, 2024 •

edited

Loading

thiell commented Feb 1, 2024

skwde commented Feb 1, 2024

degremont commented Feb 1, 2024 •

edited

Loading

skwde commented Feb 2, 2024

degremont commented Feb 2, 2024

skwde commented Feb 5, 2024

degremont commented Feb 5, 2024

skwde commented Feb 7, 2024

topology: *!compute as root nodes #553

topology: *!compute as root nodes #553

Comments

skwde commented Jan 31, 2024

Initial config

Extending root nodes to all nodes

degremont commented Jan 31, 2024 via email • edited Loading

skwde commented Feb 1, 2024 • edited Loading

thiell commented Feb 1, 2024

skwde commented Feb 1, 2024

degremont commented Feb 1, 2024 • edited Loading

skwde commented Feb 2, 2024

degremont commented Feb 2, 2024

skwde commented Feb 5, 2024

degremont commented Feb 5, 2024

skwde commented Feb 7, 2024

topology: `*!compute` as root nodes #553

topology: `*!compute` as root nodes #553

degremont commented Jan 31, 2024 via email •

edited

Loading

skwde commented Feb 1, 2024 •

edited

Loading

degremont commented Feb 1, 2024 •

edited

Loading