The PerformanceTests WordCountIT PythonVersions job is flaky #32144

github-actions · 2024-08-10T03:35:18Z

The PerformanceTests WordCountIT PythonVersions is failing over 50% of the time.
Please visit https://github.com/apache/beam/actions/workflows/beam_PerformanceTests_WordCountIT_PythonVersions.yml?query=is%3Afailure+branch%3Amaster to see all failed workflow runs.
See also Grafana statistics: http://metrics.beam.apache.org/d/CTYdoxP4z/ga-post-commits-status?orgId=1&viewPanel=9&var-Workflow=PerformanceTests%20WordCountIT%20PythonVersions

damccorm · 2024-08-14T09:28:49Z

Looks like this is back to green

github-actions · 2024-11-07T03:37:29Z

Reopening since the workflow is still flaky

damondouglas · 2024-12-06T00:34:30Z

Unassigning myself but relaying my research on this ticket.

Situation

This workflow's test failed roughly every 2 to 3 days in the past two weeks.

Background

This workflow is scheduled to run twice daily. Recent inspection of the latest failures shows that a timeout (Failed: Timeout >1800.0s) when the actual Dataflow Job for that execution succeeded. The stack trace of each failure is not the same for the past two weeks' failures. In each build scans' timeline we see that :sdks:python:test-suites:dataflow:py39:runPerformanceTest takes approximately 30m cutting off at the configured timeout.

Said timeout is set on the runPerformanceTest gradle task per https://github.com/pytest-dev/pytest-timeout. Dataflow Jobs for these failed tests take approximately 10 to 13m. Successful tests do not print out any information about the Dataflow Job to compare.

There are additional tasks performed by the _run_workcount_it method such as cleanup and publishing metrics to BigQuery. Further analysis of the cleanup and publishing to metrics only requires information about artifacts and metadata generated during the test, such as the Job Id, Google Cloud storage files, etc. Notably, there's a usage of an influx DB to read and then write to BigQuery.

Assessment

We can rule out any failing Dataflow Jobs as a root cause of the failure incidences. Moreover, there seems to be ~15m of extra work outside the Dataflow Job execution that is being executed within the test code. There seems like a lot of unnecessary coupling of after test functions with running the test.

Recommendations

Remove the after test clean up and consider using a Google Cloud storage wildcard approach to schedule a deletion of test artifacts outside test execution.
Remove the influx DB read and write to BigQuery. Perhaps use a scheduled batch or streaming Pipeline to collect these results into BigQuery.

github-actions bot added bug flaky_test P1 workflow_id: 70875494 labels Aug 10, 2024

damccorm closed this as completed Aug 14, 2024

github-actions bot added this to the 2.59.0 Release milestone Aug 14, 2024

github-actions bot reopened this Nov 7, 2024

damondouglas self-assigned this Dec 5, 2024

damondouglas removed their assignment Dec 6, 2024

kennknowles modified the milestones: 2.59.0 Release, 2.62.0 Release Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The PerformanceTests WordCountIT PythonVersions job is flaky #32144

The PerformanceTests WordCountIT PythonVersions job is flaky #32144

github-actions bot commented Aug 10, 2024

damccorm commented Aug 14, 2024

github-actions bot commented Nov 7, 2024

damondouglas commented Dec 6, 2024 •

edited

Loading

The PerformanceTests WordCountIT PythonVersions job is flaky #32144

The PerformanceTests WordCountIT PythonVersions job is flaky #32144

Comments

github-actions bot commented Aug 10, 2024

damccorm commented Aug 14, 2024

github-actions bot commented Nov 7, 2024

damondouglas commented Dec 6, 2024 • edited Loading

Situation

Background

Assessment

Recommendations

damondouglas commented Dec 6, 2024 •

edited

Loading