Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeMode error, copy a file from local to remote nodes. #549

Open
luxiaoyong opened this issue Dec 16, 2023 · 4 comments
Open

TreeMode error, copy a file from local to remote nodes. #549

luxiaoyong opened this issue Dec 16, 2023 · 4 comments

Comments

@luxiaoyong
Copy link

luxiaoyong commented Dec 16, 2023

clustershell is Excellent! Thanks for sharing this great project.
When using the TreeMode to copy a file to remote node and the dest is a directory which is similar to /tmp, I encounter a problem.
My command:
clush -d -o-q -w compute -b -S --copy /home/tt.txt --dest /tmp
This command cause a error, I can use the following command to avoid this error.
clush -d -o-q -w compute -b -S --copy /home/tt.txt --dest /tmp/
In more serious cases, the directory on the remote node will be replaced with the copied file, and no error will be reported.

The debug log below:

$ clush -d -o-q -w compute  -b -S --copy /home/tt.txt --dest /tmp
DEBUG:root:clush: STARTING DEBUG
Changing max open files soft limit from 65535 to 8192
User interaction: True
Create STDIN worker: False
clush: enabling tree topology (2 gateways)
clush: nodeset=compute fanout=15 [timeout conn=15.0 cmd=0.0] copy sources=['/home/tt.txt'] dest=/tmp**

control
|- control1
|  `- compute
`- control2
   `- compute2

DEBUG:ClusterShell.Worker.Tree:stderr=True
DEBUG:ClusterShell.Worker.Tree:TreeWorker._launch on compute (fanout=15)
DEBUG:ClusterShell.Worker.Tree:copy source=/home/tt.txt, dest=/tmp
DEBUG:ClusterShell.Worker.Tree:copy arcname=tmp destdir=/
DEBUG:ClusterShell.Worker.Tree:next_hops=[('control1', 'compute')]
DEBUG:ClusterShell.Worker.Tree:trying gateway control1 to reach compute
DEBUG:ClusterShell.Worker.Tree:_copy_remote gateway=control1 source=/home/tt.txt dest=/ reverse=False
DEBUG:ClusterShell.Worker.Tree:_copy_remote: tar cmd: tar -xf - -C '/'
DEBUG:ClusterShell.Task:pchannel: creating new channel <ClusterShell.Propagation.PropagationChannel object at 0x7f6a0bd32760>
SSHCLIENT: ssh -q -oForwardAgent=no -oForwardX11=no -oConnectTimeout=15 -oBatchMode=yes control1 CLUSTERSHELL_GW_PYTHON_EXECUTABLE=/home/itool/inspector_agent/env/bin/python /home/itool/inspector_agent/env/bin/python -m ClusterShell.Gateway -Bu
DEBUG:ClusterShell.Engine.Engine:set_events: client <ClusterShell.Engine.EPoll.EngineEPoll object at 0x7f6a0c857ee0> not registered
DEBUG:ClusterShell.Engine.Engine:set_events: client <ClusterShell.Engine.EPoll.EngineEPoll object at 0x7f6a0c857ee0> not registered
DEBUG:ClusterShell.Engine.Engine:set_events: client <ClusterShell.Engine.EPoll.EngineEPoll object at 0x7f6a0c857ee0> not registered
DEBUG:ClusterShell.Engine.Engine:set_events: client <ClusterShell.Engine.EPoll.EngineEPoll object at 0x7f6a0c857ee0> not registered
DEBUG:ClusterShell.Engine.Engine:set_events: client <ClusterShell.Engine.EPoll.EngineEPoll object at 0x7f6a0c857ee0> not registered
DEBUG:ClusterShell.Propagation:shell nodes=compute timeout=-1 worker=140093441719552 remote=True
DEBUG:ClusterShell.Propagation:send_queued: 0
DEBUG:ClusterShell.Propagation:write buflen=10240
DEBUG:ClusterShell.Propagation:send_queued: 1
DEBUG:ClusterShell.Worker.Tree:TreeWorker: _check_ini (0, 0)
control1: b'<?xml version="1.0" encoding="utf-8"?>'
control1: b'<channel version="1.9.1"><message type="ACK" msgid="2" ack="0"></message>'
DEBUG:ClusterShell.Propagation:recv: Message CHA (type: CHA, msgid: 3)
DEBUG:ClusterShell.Propagation:channel started (version 1.9.1 on remote gateway)
DEBUG:ClusterShell.Propagation:recv: Message ACK (type: ACK, msgid: 2, ack: 0)
DEBUG:ClusterShell.Propagation:recv_cfg
DEBUG:ClusterShell.Propagation:CTL - connection with gateway fully established
DEBUG:ClusterShell.Propagation:dequeuing sendq: Message CTL (type: CTL, msgid: 1, srcid: 140093441719552, action: shell, target: compute)
control1: b'<message type="ACK" msgid="4" ack="1"></message>'
DEBUG:ClusterShell.Propagation:recv: Message ACK (type: ACK, msgid: 4, ack: 1)
DEBUG:ClusterShell.Propagation:got ack (ACK)
DEBUG:ClusterShell.Propagation:dequeuing sendq: Message CTL (type: CTL, msgid: 2, srcid: 140093441719552, action: write, target: compute)
control1: b'<message type="ACK" msgid="6" ack="2"></message>'
DEBUG:ClusterShell.Propagation:recv: Message ACK (type: ACK, msgid: 6, ack: 2)
DEBUG:ClusterShell.Propagation:got ack (ACK)
control1: b'<message type="SER" msgid="7" srcid="140093441719552" nodes="compute">gASVXgAAAAAAAABDWnRhcjogdG1wOiBDYW5ub3Qgb3BlbjogRmlsZSBleGlzdHMKdGFyOiBFeGl0aW5nIHdpdGggZmFpbHVyZSBzdGF0dXMgZHVlIHRvIHByZXZpb3VzIGVycm9yc5Qu</message>'
DEBUG:ClusterShell.Propagation:recv: Message SER (type: SER, msgid: 7, srcid: 140093441719552, nodes: compute)
control1: b'<message type="RET" msgid="8" srcid="140093441719552" retcode="2" nodes="compute"></message>'
**compute: tar: tmp: Cannot open: File exists
compute: tar: Exiting with failure status due to previous errors**
DEBUG:ClusterShell.Propagation:recv: Message RET (type: RET, msgid: 8, srcid: 140093441719552, retcode: 2, nodes: compute)
clush: compute: exited with exit code 2
DEBUG:ClusterShell.Worker.Tree:_on_remote_node_close compute 0 via gw control1
DEBUG:ClusterShell.Worker.Tree:check_fini 1 1
DEBUG:ClusterShell.Worker.Tree:TreeWorker._check_fini <ClusterShell.Worker.Tree.TreeWorker object at 0x7f6a0bd43d00> call pchannel_release for gw control1
DEBUG:ClusterShell.Task:pchannel_release control1 <ClusterShell.Worker.Tree.TreeWorker object at 0x7f6a0bd43d00>
DEBUG:ClusterShell.Task:pchannel_release: destroying channel <ClusterShell.Propagation.PropagationChannel object at 0x7f6a0bd32760>
DEBUG:ClusterShell.Propagation:ev_close gateway=control1 <ClusterShell.Propagation.PropagationChannel object at 0x7f6a0bd32760>
DEBUG:ClusterShell.Propagation:ev_close rc=None
DEBUG:ClusterShell.Propagation:error on gateway control1 (setup=True)
DEBUG:ClusterShell.Propagation:gateway control1 now set as unreachable
DEBUG:ClusterShell.Worker.EngineClient:<EnginePort at 0x140093453336000 (streams=(7, 8))>: dropped msg: (<function Task._abort at 0x7f6a0c844280>, (False,), {})
@degremont
Copy link
Collaborator

Thanks. Could you just confirm the version you are using?

@luxiaoyong
Copy link
Author

I appreciate your attention to my problem. My version is 1.9.1,and we have temporarily avoided this problem by adding /.

@degremont
Copy link
Collaborator

degremont commented Mar 15, 2024

This is a known limitation. The code says:

The only case that we don't support is when source is a
file and dest is a dir without a finishing / (in that case we
cannot determine remotely whether it is a file or a
directory).

The code could not know before initiating the transfer if /tmp is an existing directory. There are no handshake where this could have been negotiated in 2 steps.

The recommendation is to add a final / when this is a remote directory.

@luxiaoyong
Copy link
Author

Thank you. I see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants