Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: JavaUsingPython pipline create large tmp files quickly causing CI drain disk space #29215

Closed
2 of 16 tasks
Abacn opened this issue Oct 31, 2023 · 0 comments · Fixed by #29219
Closed
2 of 16 tasks
Labels
awaiting triage bug done & done Issue has been reviewed after it was closed for verification, followups, etc. java P2 python

Comments

@Abacn
Copy link
Contributor

Abacn commented Oct 31, 2023

What happened?

It is found that Jenkins often run of of disk space even inventory job run twice per day. Checking the filesystem it is found there are many tmp file named after /tmp/beam-artifact.... hundreds MB per file. see https://ci-beam.apache.org/view/Inventory/job/beam_Inventory_apache-beam-jenkins-7/

This is from

https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java#L393

Done at pipeline expansion time for external transform.

This suggests real issue for users as tmp file left are large. We should be able to clean up these files after pipeline run.

Interestingly it seems only Jenkins 7 (or a few other workers) are affected. It is known that Jenkins scheduler tend to run same postcommit job on same machine. Likely due to this job:

https://ci-beam.apache.org/job/beam_PostCommit_XVR_JavaUsingPython_Dataflow/

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn Abacn changed the title [Bug]: External.resolveArtifacts create large tmp files quickly causing CI drain disk space [Bug]: JavaUsingPython pipline create large tmp files quickly causing CI drain disk space Oct 31, 2023
@github-actions github-actions bot added this to the 2.52.0 Release milestone Oct 31, 2023
@damccorm damccorm added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting triage bug done & done Issue has been reviewed after it was closed for verification, followups, etc. java P2 python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants