Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WorkerTree Function implementation problems --rcopy #543

Open
luxiaoyong opened this issue Oct 24, 2023 · 2 comments
Open

WorkerTree Function implementation problems --rcopy #543

luxiaoyong opened this issue Oct 24, 2023 · 2 comments
Assignees

Comments

@luxiaoyong
Copy link

I have a poblem with the tree execution mode, when i use rcopy params and copy a big file(more than 12M) from two romote host to local, problem like below:

command: clush -o -q -w host1,host2 -b -S --rcopy /home/collect.tar.gz --dest /home/tmp/

output:
Exception in thread Task-2:
Traceback (most recent call last):
File "env/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "env/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 390, in _thread_start
self.excepthook(*sys.exc_info())
File "env/lib/python3.8/site-packages/ClusterShell/CLI/Clush.py", line 822, in clush_excepthook
raise exp
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 388, in _thread_start
self._resume()
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 790, in _resume
self._run(self.timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Task.py", line 403, in _run
self._engine.run(timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
self.runloop(timeout)
File "env/lib/python3.8/site-packages/ClusterShell/Engine/EPoll.py", line 157, in runloop
client._handle_read(sname)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Exec.py", line 192, in _handle_read
node_msgline(key, msg, sname) # handle full msg line
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Exec.py", line 166, in _on_nodeset_msgline
self.worker._on_node_msgline(nodes, msg, sname)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Worker.py", line 277, in _on_node_msgline
self.eh.ev_read(self, node, sname, msg)
File "env/lib/python3.8/site-packages/ClusterShell/Communication.py", line 258, in ev_read
self.recv(msg)
File "env/lib/python3.8/site-packages/ClusterShell/Propagation.py", line 270, in recv
self.recv_ctl(msg)
File "env/lib/python3.8/site-packages/ClusterShell/Propagation.py", line 376, in recv_ctl
metaworker._on_remote_node_close(node, rc, self.gateway)
File "env/lib/python3.8/site-packages/ClusterShell/Worker/Tree.py", line 459, in _on_remote_node_close
bnode, len(tmptar.getmembers()),
File "env/lib/python3.8/tarfile.py", line 1791, in getmembers
self._load() # all members, we first have to
File "env/lib/python3.8/tarfile.py", line 2379, in _load
tarinfo = self.next()
File "env/lib/python3.8/tarfile.py", line 2312, in next
raise ReadError("unexpected end of data")
tarfile.ReadError: unexpected end of data

@luxiaoyong
Copy link
Author

I fixed this bug by modifying the following code, It is not clear whether these modifications will cause other abnormalities. I hope you can help review the code. I will appreciate it very much. The codes are as follows:

file: Worker/Tree.py

def _on_remote_node_close(self, node, rc, gateway):
    """remote node closing with return code"""
    DistantWorker._on_node_close(self, node, rc)
    self.logger.debug("_on_remote_node_close %s %s via gw %s", node,
                      self._close_count, gateway)

    node_arr = []
    # finalize rcopy: extract tar data
    if self.source and self.reverse:
        for bnode, buf in self._rcopy_bufs.items():
            if bnode == node:
                node_arr.append(bnode)
                tarfileobj = self._rcopy_tars[bnode]
                if len(buf) > 0:
                    self.logger.debug("flushing node %s buf %d bytes", bnode,
                                    len(buf))
                    tarfileobj.write(buf)
                tarfileobj.flush()
                tarfileobj.seek(0)
                tmptar = tarfile.open(fileobj=tarfileobj)
                try:
                    self.logger.debug("%s extracting %d members in dest %s",
                                    bnode, len(tmptar.getmembers()),
                                    self.dest)
                    tmptar.extractall(path=self.dest)
                except IOError as ex:
                    self._on_remote_node_msgline(bnode, ex, 'stderr', gateway)
                finally:
                    tmptar.close()

        for item_node in node_arr:
            del self._rcopy_bufs[item_node]
            del self._rcopy_tars[item_node]
        # self._rcopy_bufs = {}
        # self._rcopy_tars = {}

    self.gwtargets[str(gateway)].remove(node)
    self._close_count += 1
    self._check_fini(gateway)

@thiell thiell self-assigned this Oct 27, 2023
@luxiaoyong
Copy link
Author

Hi, @thiell , I am glad to pay attention to my problem. When files are copied from multiple remote nodes to a local node and the size of the copied files is large (for example, 12 M), the transmission uses the fragment mode, and the transmission of multiple nodes will not be ended at the same time. When one of the nodes finishes, it will receive the RET message, and the _on_remote_node_close function will be triggered. In this case, the nodes that have not finished the transmission will also extract files, leading to the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants