Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestHistoryMetrics.testEvents graph changes structure in updates-testing #21237

Closed
cockpituous opened this issue Nov 9, 2024 · 7 comments · Fixed by #21322
Closed

TestHistoryMetrics.testEvents graph changes structure in updates-testing #21237

cockpituous opened this issue Nov 9, 2024 · 7 comments · Fixed by #21322
Assignees
Labels

Comments

@cockpituous
Copy link
Contributor

The job fedora-40/updates-testing failed on commit f474f0d.

Log: https://cockpit-logs.us-east-1.linodeobjects.com/pull-0-f474f0da-20241109-013334-fedora-40-updates-testing/log.html

@martinpitt martinpitt changed the title Tests failed on f474f0da1c6f723b9f55b4b599b5468047fc8a59 TestHistoryMetrics.testEvents graph changes structure in updates-testing Nov 10, 2024
@martinpitt
Copy link
Member

Confirmed in #21241 this seems to be real.

@jelly
Copy link
Member

jelly commented Nov 13, 2024

Relevant package updates:

 pcp                   x86_64 6.3.2-1.fc40                updates-testing 1.5 M
 pcp-conf              x86_64 6.3.2-1.fc40                updates-testing  30 k
 pcp-libs              x86_64 6.3.2-1.fc40                updates-testing 651 k
 pcp-selinux           x86_64 6.3.2-1.fc40                updates-testing  31 k

Previous version:

pcp-selinux-6.3.1-1.fc40.x86_64
pcp-conf-6.3.1-1.fc40.x86_64
pcp-libs-6.3.1-1.fc40.x86_64
pcp-6.3.1-1.fc40.x86_64
python3-pcp-6.3.1-1.fc40.x86_64

Changelog: https://github.com/performancecopilot/pcp/blob/main/CHANGELOG

Test assertion failure:

  File "/home/jelle/projects/cockpit/main/./test/verify/check-metrics", line 306, in testEvents
    self.assertGreaterEqual(getCompressedMinuteValue(test=self, g_type="cpu", saturation=True, hour=1598950800000, minute=8), 0.3)
AssertionError: 0.12799999713897706 not greater than or equal to 0.3

The metrics are historical so from a known dump so the values should not really change.

@martinpitt martinpitt self-assigned this Nov 21, 2024
martinpitt added a commit to martinpitt/cockpit that referenced this issue Nov 21, 2024
That version changes the interpretation of existing data somewhat, and
thus events slightly move around. Make the assertions flexible enough to work
with old and new versions.

There was not much science in the original assertions anyway -- I simply
recorded some stuff and then pinned down the status quo.

We can make this stricter again once all supported OSes have PCP ≥ 6.3.2 (but
that'll be a few years).

Addresses the non-pixel parts of cockpit-project#21237
@martinpitt
Copy link
Member

#21295 should fix the textual bits. Making the pixel tests accomodate both versions is just too awful. https://bodhi.fedoraproject.org/updates/FEDORA-2024-22b21606d8 should hit updates in one or two days, then let's refresh the image and accept the new pixel refs. We can suffer through two more nightly failures. (There's also the Python crash, though)

@mvollmer
Copy link
Member

I could reproduce the differences with pmval:

With pcp-6.3.1-1.fc40.x86_64:

# pmval -t 60 -a /var/log/pcp/pmlogger/localhost.localdomain/20200901.04.59.meta kernel.all.load

metric:    kernel.all.load
archive:   /var/log/pcp/pmlogger/localhost.localdomain/20200901.04.59.meta
host:      localhost.localdomain
start:     Tue Sep  1 08:59:46 2020
end:       Tue Sep  1 09:08:16 2020
semantics: instantaneous value
units:     none
samples:   9
interval:  60.00 sec
08:59:46.594  No values available

                 1 minute      5 minute     15 minute 
09:00:46.594       0.6400        0.1300     4.000E-02 
09:01:46.594       0.4600        0.1700     6.000E-02 
09:02:46.594       0.2900        0.1800     7.000E-02 
09:03:46.594       0.3300        0.2100     9.000E-02 
09:04:46.594      32.61          7.640         2.540  
09:05:46.594      15.22          7.620         2.880  
09:06:46.594      50.38         17.47          6.470  
09:07:46.594      20.93         15.34          6.440  

With pcp-6.3.2-2.el10.x86_64:

# pmval -t 60 -a /var/log/pcp/pmlogger/localhost.localdomain/20200901.04.59.meta kernel.all.load

metric:    kernel.all.load
archive:   /var/log/pcp/pmlogger/localhost.localdomain/20200901.04.59.meta
host:      localhost.localdomain
start:     Tue Sep  1 04:59:46 2020
end:       Tue Sep  1 05:08:16 2020
semantics: instantaneous value
units:     none
samples:   9
interval:  60.00 sec
04:59:46.594  No values available

                 1 minute      5 minute     15 minute 
05:00:46.594       0.4600        0.1700     6.000E-02 
05:01:46.594       0.2900        0.1800     7.000E-02 
05:02:46.594       0.3300        0.2100     9.000E-02 
05:03:46.594      32.61          7.640         2.540  
05:04:46.594      15.22          7.620         2.880  
05:05:46.594      50.38         17.47          6.470  
05:06:46.594      20.93         15.34          6.440  
05:07:46.594       7.680        12.54          6.040  

Note how the values jump up one minute later in the newer version.

Both versions show the same raw data in the archives, of course:

# pmval -U /var/log/pcp/pmlogger/localhost.localdomain/20200901.04.59.meta kernel.all.load

metric:    kernel.all.load
archive:   /var/log/pcp/pmlogger/localhost.localdomain/20200901.04.59.meta
host:      localhost.localdomain
start:     Tue Sep  1 04:59:46 2020
end:       Tue Sep  1 05:08:16 2020
semantics: instantaneous value
units:     none
samples:   all

                 1 minute      5 minute     15 minute 
04:59:46.609       0.6400        0.1300     4.000E-02 
05:00:46.617       0.4600        0.1700     6.000E-02 
05:01:46.621       0.2900        0.1800     7.000E-02 
05:02:46.630       0.3300        0.2100     9.000E-02 
05:03:46.620      32.61          7.640         2.540  
05:04:46.618      15.22          7.620         2.880  
05:05:46.623      50.38         17.47          6.470  
05:06:46.632      20.93         15.34          6.440  
05:07:46.627       7.680        12.54          6.040  

In fact, there isn't any interpolation of values going on. The sample time is rounded differently. In the old version 05:03:46.594 gets the raw data for 05:02:46.630, while in the new version, it gets the raw data for 05:03:46.620. The new version seems "more correct", the old version seems to round down, while the new version might round to nearest.

Anyway, no bug to report. We have to survive with both.

mvollmer pushed a commit that referenced this issue Nov 21, 2024
That version changes the interpretation of existing data somewhat, and
thus events slightly move around. Make the assertions flexible enough to work
with old and new versions.

There was not much science in the original assertions anyway -- I simply
recorded some stuff and then pinned down the status quo.

We can make this stricter again once all supported OSes have PCP ≥ 6.3.2 (but
that'll be a few years).

Addresses the non-pixel parts of #21237
@mvollmer
Copy link
Member

What about not doing pixel tests for updates-testing? We don't have a mechanism to cope with pixel alternatives and any failure in updates-testing will just be pain.

If we write code like

self.assert_pixels("dialog" if not updates_testing else "dialog-testing", ...)

then the machinery will yet at us that one of "dialog" or "dialog-testing" is unused... but we could invent something if we want to.

@martinpitt
Copy link
Member

What about not doing pixel tests for updates-testing?

That crossed my mind, but I feel that would diminish their value too much. They did help us to find this difference (which could have been a regression). I wouldn't mind skipping this specific pixel test on updates-testing if you prefer. But honestly I currently lean towards dragging our feet a bit and then forgetting about it.

@martinpitt
Copy link
Member

cockpit-project/bots#7126 refreshes the image, but that fails on a handful of other issues. We'll update the pixel tests as part of that.

@github-project-automation github-project-automation bot moved this from next to improvement in Pilot tasks Nov 26, 2024
@martinpitt martinpitt moved this from improvement to detriment in Pilot tasks Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants