TarOutput: Close collected files after reading them to free buffers

With tar output, TarOutput.addRealFile() did not close file objects. So in tar mode, the file objects of all captures regular files were not closed. As the fixed line use `TarFile.addfile(tarinfo, open("fn"))` the opened file object could have never been passed anywhere else so it is safe to replace this pattern using a context manager. Summary: - The changes to xen-bugtool are 4 safe lines to use `with open` - All other changes are improvements for the unit tests to cover it. Description: This was found by PYTHONDEVMODE=yes to trace unclosed resources. I set this up in .pre-commit-config.yaml, so while developing tests, I saw the warning while testing with pre-commit run -v. Fixes in this PR: - Use context managers to fix the locations which are easy to fix. - Extend the unit tests (those in master now) to cover both locations. - While updating those tests, improve them to check the file list of the ZIP and tar archives generated in this particular test case. - Configure GitHub CI to create source annotations for PYTHONWARNINGS which are issued by such cases and are shown in code review of PRs. Notes: In the course of a year, my development machine accumulated 24000 files in /var/xapi/blobs (spread evenly over all subdirectories), so Python apparently does some garbage collection, but it's certainly not good to never close so many open file-like objects in Python. Closing them in time has the potential to reduce the memory usage of the xen-bugtool / xenserver-status-report process. There are more locations where Python 3.10 reports unclosed file objects, but it seems we may have to use enable tracemalloc as recommended by the issued PYTHONWARNINGS to find their source. Signed-off-by: Bernhard Kaindl <[email protected]>
xenserver · Jan 12, 2024 · 5fc5d42 · 5fc5d42 · github-actions · Jan 12, 2024
1 parent 1aae840
commit 5fc5d42
Show file tree

Hide file tree

Showing 8 changed files with 80 additions and 11 deletions.
diff --git a/.github/workflows/PYTHONWARNINGS-problemMatcher.json b/.github/workflows/PYTHONWARNINGS-problemMatcher.json
@@ -0,0 +1,17 @@
+{
+  "problemMatcher": [
+        {
+            "owner": "python-libs-PYTHONWARNINGS",
+            "severity": "warning",
+            "pattern": [
+                {
+                    "regexp": "^.*/python-libs/(.+):([0-9]*):(.*):(.*)$",
+                    "file": 1,
+                    "line": 2,
+                    "code": 3,
+                    "message": 4
+                }
+            ]
+        }
+  ]
+}
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -89,9 +89,17 @@ jobs:
           path: ~/.cache/pre-commit
           key: pre-commit|${{ env.pythonLocation }}|${{ hashFiles('.pre-commit-config.yaml') }}
 
+      # https://docs.python.org/3/library/devmode.html#resourcewarning-example
+      # If pytest runs with PYTHONDEVMODE=yes, it enables resource checks like
+      # unclosed file warnings. Configure GitHub to show all such warnings as
+      # annotations at the source location they occur in the PR code review:
+
+      - run: echo "::add-matcher::.github/workflows/PYTHONWARNINGS-problemMatcher.json"
+
       - uses: pre-commit/[email protected]
         name: Run pre-commit checks
         env:
+          PYTHONDEVMODE: yes  # Enable Python3 checks. See the comment above.
           # Skip the no-commit-to-branch check inside of GitHub CI (for CI on merge to master)
           SKIP: no-commit-to-branch
 

diff --git a/tests/integration/dom0-template/etc/group b/tests/integration/dom0-template/etc/group
@@ -0,0 +1 @@
+root:x:0:
diff --git a/tests/integration/dom0-template/etc/passwd b/tests/integration/dom0-template/etc/passwd
@@ -0,0 +1 @@
+root:x:0:0:root:/root:/bin/bash
diff --git a/tests/integration/dom0-template/etc/xensource/bugtool/mock/stuff.xml b/tests/integration/dom0-template/etc/xensource/bugtool/mock/stuff.xml
@@ -1,5 +1,7 @@
 <collect>
 <list>/etc</list>
+<files>/etc/passwd</files>
+<files>/etc/group</files>
 <files>/proc/self/status</files>
 <directory pattern=".*ax_user_watches">/proc/sys/fs/epoll</directory>
 <command label="proc_version">cat /proc/version</command>

diff --git a/tests/unit/test_load_plugins.py b/tests/unit/test_load_plugins.py
@@ -23,15 +23,23 @@ def test_load_plugins(bugtool, dom0_template):
         False,
         9,
     )
-    # Assert the size of the created mock capability:
-    assert bugtool.cap_sizes["mock"] == 0
+    # Assert the size of the files of the created mock inventory entry:
+    assert bugtool.cap_sizes["mock"] > 0  # /etc/passwd should have content
     # Assert the entries added to the bugtool.data dict:
     assert bugtool.data == {
         "ls -l /etc": {
             "cap": "mock",
             "cmd_args": ["ls", "-l", "/etc"],
             "filter": None,
         },
+        "/etc/passwd": {
+            "cap": "mock",
+            "filename": "/etc/passwd",
+        },
+        "/etc/group": {
+            "cap": "mock",
+            "filename": "/etc/group",
+        },
         "/proc/self/status": {
             "cap": "mock",
             "filename": "/proc/self/status",

diff --git a/tests/unit/test_output.py b/tests/unit/test_output.py
@@ -13,8 +13,24 @@ def assert_valid_inventory_schema(inventory_tree):
         XMLSchema(parse(xml_schema)).assertValid(inventory_tree)
 
 
-def assert_mock_bugtool_plugin_output(extracted):
+def assert_mock_bugtool_plugin_output(temporary_directory, subdir, names):
     """Assertion check of the output files from the mock bugtool plugin"""
+
+    # Assert the list of file names in the status report archive:
+    expected_names = [
+        subdir + "/etc/group",
+        subdir + "/etc/passwd.tar",
+        subdir + "/inventory.xml",
+        subdir + "/ls-l-%etc.out",
+        subdir + "/proc/self/status",
+        subdir + "/proc/sys/fs/epoll/max_user_watches",
+        subdir + "/proc_version.out",
+    ]
+    assert sorted(names) == expected_names
+
+    extracted = "%s/%s/" % (temporary_directory, subdir)
+
+    # Will be refactored to be more easy in a separate commit soon:
     assert_valid_inventory_schema(parse(extracted + "inventory.xml"))
     with open(extracted + "proc_version.out") as proc_version:
         assert proc_version.read()[:14] == "Linux version "
@@ -24,6 +40,17 @@ def assert_mock_bugtool_plugin_output(extracted):
         assert status.read()[:5] == "Name:"
     with open(extracted + "proc/sys/fs/epoll/max_user_watches") as max_user_watches:
         assert int(max_user_watches.read()) > 0
+    with open(extracted + "etc/group") as group:
+        assert group.readline() == "root:x:0:\n"
+
+    # Check the contents of the sub-archive "etc/passwd.tar":
+    with tarfile.TarFile(extracted + "etc/passwd.tar") as tar:
+        assert tar.getnames() == [subdir + "/etc/passwd"]
+        # TarFile.extractfile() does not support context managers on Python2:
+        passwd = tar.extractfile(subdir + "/etc/passwd")
+        assert passwd
+        assert passwd.readline() == b"root:x:0:0:root:/root:/bin/bash\n"
+        passwd.close()
 
 
 def minimal_bugtool(bugtool, dom0_template, archive, subdir):
@@ -32,7 +59,9 @@ def minimal_bugtool(bugtool, dom0_template, archive, subdir):
     # Load the mock plugin from dom0_template and process the plugin's caps:
     bugtool.PLUGIN_DIR = dom0_template + "/etc/xensource/bugtool"
     bugtool.entries = ["mock"]
+    archive.declare_subarchive("/etc/passwd", subdir + "/etc/passwd.tar")
     bugtool.load_plugins(just_capabilities=False)
+    # Mock the 2nd argument of the ls -l /etc to collect it using dom0_template:
     bugtool.data["ls -l /etc"]["cmd_args"][2] = dom0_template + "/etc"
     bugtool.collect_data(subdir, archive)
     bugtool.include_inventory(archive, subdir)
@@ -51,10 +80,9 @@ def test_tar_output(bugtool, tmp_path, dom0_template):
 
     # Check the TarFile contents
     tmp = tmp_path.as_posix()
-    tar_archive = tarfile.TarFile(tmp + "/tarball.tar")
-    tar_archive.extractall(tmp)
-    tar_archive.close()
-    assert_mock_bugtool_plugin_output(tmp + "/" + subdir + "/")
+    with tarfile.TarFile(tmp + "/tarball.tar") as tar:
+        tar.extractall(tmp)
+        assert_mock_bugtool_plugin_output(tmp, subdir, tar.getnames())
 
 
 def test_zip_output(bugtool, tmp_path, dom0_template):
@@ -69,5 +97,6 @@ def test_zip_output(bugtool, tmp_path, dom0_template):
 
     # Check the ZipFile contents
     tmp = tmp_path.as_posix()
-    zipfile.ZipFile(tmp + "/zipfile.zip").extractall(tmp)
-    assert_mock_bugtool_plugin_output(tmp + "/" + subdir + "/")
+    with zipfile.ZipFile(tmp + "/zipfile.zip") as zip:
+        zip.extractall(tmp)
+        assert_mock_bugtool_plugin_output(tmp, subdir, zip.namelist())
diff --git a/xen-bugtool b/xen-bugtool
@@ -1697,7 +1697,8 @@ class TarSubArchive(io.BytesIO):
         :param name: Recorded path of the the file in the tar archive for extraction
         :param filename: Real file name of the file to be added to the tar archive
         """
-        self.file.addfile(self.file.gettarinfo(filename, name), open(filename, "rb"))
+        with open(filename, "rb") as buffered_reader:
+            self.file.addfile(self.file.gettarinfo(filename, name), buffered_reader)
 
 class ArchiveWithTarSubarchives(object):
     """Base class for TarOutput and ZipOutput with support to create sub-archives"""
@@ -1762,7 +1763,9 @@ class TarOutput(ArchiveWithTarSubarchives):
         s = os.stat(filename)
         ti.mtime = s.st_mtime
         ti.size = s.st_size
-        self.tf.addfile(ti, open(filename, "rb"))
+        with open(filename, "rb") as buffered_reader:
+            self.tf.addfile(ti, buffered_reader)
+
 
     def add_path_with_data(self, name, data):
         ti = self._getTi(name)
File	Stmts	Miss	Cover	Missing
__init__.py	0	0	100%
conftest.py	30	0	100%
test_dir_list.py	11	0	100%
test_fs_funcs.py	10	0	100%
test_load_plugins.py	5	0	100%
test_output.py	47	0	100%
test_process_output.py	8	0	100%
test_xapidb_filter.py	17	0	100%
xen-bugtool	1523	686	54%	95, 100–101, 410, 490–491, 493–494, 511, 517, 540, 551–552, 555–556, 602, 607–608, 612, 614–618, 620, 632–634, 637, 656–657, 660, 662–666, 668–669, 672–673, 677–678, 680–681, 685–688, 690–693, 696–698, 700–701, 705–707, 710–711, 720, 722, 724–727, 729–731, 733–734, 738–739, 741, 743–749, 753–754, 756–760, 765–773, 775–777, 779–781, 783–784, 786, 801, 803–807, 809–810, 812, 814, 816–839, 842–846, 848, 850–871, 874–875, 877, 879, 882, 884–893, 895, 897–900, 902–912, 914–930, 932–936, 938–944, 947–976, 978–988, 990–993, 995–1015, 1017–1020, 1022, 1024, 1026–1027, 1029–1030, 1032, 1047–1048, 1050–1051, 1055–1056, 1058–1059, 1061, 1063, 1065–1066, 1068–1069, 1071–1080, 1083–1089, 1092–1093, 1095–1096, 1105–1106, 1108–1109, 1113, 1135–1138, 1140–1144, 1146–1153, 1155, 1157–1160, 1162, 1165–1167, 1169–1175, 1178–1180, 1184–1185, 1187–1188, 1190–1192, 1195–1196, 1199, 1201–1202, 1204, 1206, 1208–1209, 1213–1216, 1219, 1222, 1225–1227, 1230–1234, 1237–1238, 1241–1244, 1250–1256, 1259–1263, 1265–1281, 1284–1292, 1295–1296, 1298, 1300–1301, 1303–1311, 1314, 1316–1319, 1321, 1323–1324, 1358, 1386, 1401, 1409–1410, 1414–1416, 1418, 1421, 1423–1425, 1427–1431, 1433, 1437, 1440, 1443–1449, 1451, 1453–1457, 1463–1465, 1467–1472, 1475–1479, 1483–1484, 1486, 1503–1504, 1506, 1508–1509, 1511–1513, 1515–1522, 1524–1529, 1533–1534, 1536, 1538, 1540, 1556–1557, 1559, 1561–1563, 1566–1567, 1569, 1571–1573, 1576–1577, 1579, 1581–1584, 1587–1597, 1600–1604, 1614, 1626, 1636, 1640, 1651, 1675–1677, 1744, 1750, 1785, 1809, 1824, 1884, 1903, 1905, 1907, 1910–1918, 1921–1922, 1925–1932, 1934–1935, 1937, 1941–1942, 1946, 1950–1952, 1956–1957, 1960, 1964, 1969, 1975–1976, 1978, 1982, 1984–1987, 1990–2000, 2004, 2006, 2091–2093, 2096–2098, 2110, 2129–2131, 2171–2175, 2182, 2184–2187, 2191, 2205
TOTAL	1651	686	58%