Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup CI and reduce test run time #67

Closed
advancedxy opened this issue Feb 21, 2024 · 9 comments · Fixed by #116 or #130
Closed

Speedup CI and reduce test run time #67

advancedxy opened this issue Feb 21, 2024 · 9 comments · Fixed by #116 or #130
Labels
enhancement New feature or request

Comments

@advancedxy
Copy link
Contributor

What is the problem the feature request solves?

It looks like that CIs are quite slow, the most time consuming parts varies from 60m to 100+m.
image

It would be great that we could speed them up and hopefully keep them under ~30m.

Describe the potential solution

No response

Additional context

No response

@advancedxy advancedxy added the enhancement New feature or request label Feb 21, 2024
@sunchao
Copy link
Member

sunchao commented Feb 21, 2024

We can probably split the shuffle related tests into separate CI jobs

@sunchao
Copy link
Member

sunchao commented Feb 21, 2024

... or reduce the amount of data in shuffle, right now it is 10000 in many tests.

@advancedxy
Copy link
Contributor Author

right now it is 10000 in many tests.

10000 means number of records? That doesn't seem too much. Probably we can print java/scala test durations in the CI and then we can decided which parts are most time consuming. We can optimize these tests or split them then.

@advancedxy advancedxy changed the title Speedup CI and reduce runtime Speedup CI and reduce test run time Feb 26, 2024
@advancedxy
Copy link
Contributor Author

Probably we can print java/scala test durations in the CI and then we can decided which parts are most time consuming

After enable test duration, the most time consuming tests(exceeds one minute) on X86 mac runners with JDK 1.8 are:

- Comet columnar shuffle shuffle: different data type (8 minutes, 5 seconds)
- all types, with nulls (23 minutes, 55 seconds)
- Comet columnar shuffle shuffle: different data type (8 minutes, 18 seconds)
- Comet columnar shuffle shuffle: different data type (5 minutes, 59 seconds)
- Comet columnar shuffle shuffle: different data type (6 minutes, 50 seconds)
- fix: Too many task completion listener of ArrowReaderIterator causes OOM (34 minutes, 33 seconds)
- Comet columnar shuffle shuffle: different data type (7 minutes, 22 seconds)
- Comet columnar shuffle shuffle: different data type (6 minutes, 59 seconds)

The source is https://github.com/apache/arrow-datafusion-comet/actions/runs/8047518357/job/21976917198?pr=116

The fix: Too many task completion listener of ArrowReaderIterator causes OOM (34 minutes, 33 seconds) case looks extreme. Maybe we should fix it first, cc @viirya

@sunchao
Copy link
Member

sunchao commented Feb 26, 2024

Thanks! I didn't realize "fix: Too many task completion listener of ArrowReaderIterator causes OOM" is taking so much time. I'm trying to improve the test time in #109.

The "all types, with nulls" are from the aggregation suite. We should reduce it too.

@sunchao sunchao reopened this Feb 27, 2024
@sunchao
Copy link
Member

sunchao commented Feb 27, 2024

@advancedxy @viirya what do you think if we move all the MacOS CI pipelines to post-commit, and only keep the ubuntu-latest pipelines as pre-commit? It seems the former are pretty slow especially with Java 8, while the latter are usually faster (~40min).

I think having ubuntu-latest should be sufficient to catch any potential issue in the PRs.

@viirya
Copy link
Member

viirya commented Feb 27, 2024

Okay for me. If we find any thing wrong with Mac OS, we can take action after commit to revert it or fix it.

@advancedxy
Copy link
Contributor Author

what do you think if we move all the MacOS CI pipelines to post-commit,

The Silicon ones are pretty fast, faster than ubuntu-latest. So I think we should enable that on CI as per-commit. I believe a large percent of developers are on Mac, it would be nice to find problems earlier.

For X86 with Java8, it's pretty slow. I think we should at least move it to a post-commit manner, or even remove it totally.
I'm neutral for other JDKs on X86 MacOSs.

@sunchao
Copy link
Member

sunchao commented Feb 27, 2024

OK, let's keep ubuntu-latest and MacOS Apple Silicon in the pre-commit then, and move MacOS X86 to post-commit. I'm also making changes in #109 to improve the test time in shuffle & aggregation. Will ping you once it is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants