Bug fix:WorkTree Function Exception --rcopy #545
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have a poblem with the tree execution mode, when i use rcopy params and copy a big file(more than 12M) from two romote host to local, problem like below:
command: clush -o -q -w host1,host2 -b -S --rcopy /home/collect.tar.gz --dest /home/tmp/
output:
Exception in thread Task-2:
Traceback (most recent call last):
File "env/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "env/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 390, in _thread_start
self.excepthook(*sys.exc_info())
File "env/lib/python3.8/site-packages/ClusterShell/CLI/Clush.py", line 822, in clush_excepthook
raise exp
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 388, in _thread_start
self._resume()
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 790, in _resume
self._run(self.timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 403, in _run
self._engine.run(timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
self.runloop(timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Engine/EPoll.py", line 157, in runloop
client._handle_read(sname)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Exec.py", line 192, in _handle_read
node_msgline(key, msg, sname) # handle full msg line
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Exec.py", line 166, in _on_nodeset_msgline
self.worker._on_node_msgline(nodes, msg, sname)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Worker.py", line 277, in _on_node_msgline
self.eh.ev_read(self, node, sname, msg)
File "env/lib/python3.8/site-packages/ClusterShell/Communication.py", line 258, in ev_read
self.recv(msg)
File "env/lib/python3.8/site-packages/ClusterShell/Propagation.py", line 270, in recv
self.recv_ctl(msg)
File "env/lib/python3.8/site-packages/ClusterShell/Propagation.py", line 376, in recv_ctl
metaworker._on_remote_node_close(node, rc, self.gateway)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Tree.py", line 459, in _on_remote_node_close
bnode, len(tmptar.getmembers()),
File "env/lib/python3.8/tarfile.py", line 1791, in getmembers
self._load() # all members, we first have to
File "env/lib/python3.8/tarfile.py", line 2379, in _load
tarinfo = self.next()
File "env/lib/python3.8/tarfile.py", line 2312, in next
raise ReadError("unexpected end of data")
tarfile.ReadError: unexpected end of data
When files are copied from multiple remote nodes to a local node and the size of the copied files is large (for example, 12 M), the transmission uses the fragment mode, and the transmission of multiple nodes will not be ended at the same time. When one of the nodes finishes, it will receive the RET message, and the _on_remote_node_close function will be triggered. In this case, the nodes that have not finished the transmission will also extract files, leading to the error.
I fixed this bug by modifying the this code, It is not clear whether these modifications will cause other abnormalities. I hope you can help review the code. Thank you very much.