Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix:WorkTree Function Exception --rcopy #545

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

luxiaoyong
Copy link

I have a poblem with the tree execution mode, when i use rcopy params and copy a big file(more than 12M) from two romote host to local, problem like below:

command: clush -o -q -w host1,host2 -b -S --rcopy /home/collect.tar.gz --dest /home/tmp/

output:
Exception in thread Task-2:
Traceback (most recent call last):
File "env/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "env/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 390, in _thread_start
self.excepthook(*sys.exc_info())
File "env/lib/python3.8/site-packages/ClusterShell/CLI/Clush.py", line 822, in clush_excepthook
raise exp
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 388, in _thread_start
self._resume()
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 790, in _resume
self._run(self.timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 403, in _run
self._engine.run(timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
self.runloop(timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Engine/EPoll.py", line 157, in runloop
client._handle_read(sname)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Exec.py", line 192, in _handle_read
node_msgline(key, msg, sname) # handle full msg line
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Exec.py", line 166, in _on_nodeset_msgline
self.worker._on_node_msgline(nodes, msg, sname)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Worker.py", line 277, in _on_node_msgline
self.eh.ev_read(self, node, sname, msg)
File "env/lib/python3.8/site-packages/ClusterShell/Communication.py", line 258, in ev_read
self.recv(msg)
File "env/lib/python3.8/site-packages/ClusterShell/Propagation.py", line 270, in recv
self.recv_ctl(msg)
File "env/lib/python3.8/site-packages/ClusterShell/Propagation.py", line 376, in recv_ctl
metaworker._on_remote_node_close(node, rc, self.gateway)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Tree.py", line 459, in _on_remote_node_close
bnode, len(tmptar.getmembers()),
File "env/lib/python3.8/tarfile.py", line 1791, in getmembers
self._load() # all members, we first have to
File "env/lib/python3.8/tarfile.py", line 2379, in _load
tarinfo = self.next()
File "env/lib/python3.8/tarfile.py", line 2312, in next
raise ReadError("unexpected end of data")
tarfile.ReadError: unexpected end of data

When files are copied from multiple remote nodes to a local node and the size of the copied files is large (for example, 12 M), the transmission uses the fragment mode, and the transmission of multiple nodes will not be ended at the same time. When one of the nodes finishes, it will receive the RET message, and the _on_remote_node_close function will be triggered. In this case, the nodes that have not finished the transmission will also extract files, leading to the error.
I fixed this bug by modifying the this code, It is not clear whether these modifications will cause other abnormalities. I hope you can help review the code. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant