Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Include active spiller when computing peak shuffle memory #196

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Mar 12, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

We should include activeSpillerSorter when calculating peak shuffle memory. Otherwise, the stats will be inaccurate.

What changes are included in this PR?

How are these changes tested?

@@ -257,6 +257,7 @@ private long getMemoryUsage() {
for (SpillSorter sorter : spillingSorters) {
totalPageSize += sorter.getMemoryUsage();
}
totalPageSize += activeSpillSorter.getMemoryUsage();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doubt] any ideas for testing it ? whats the max variance this can cause in worst case ? activeSpillSorter is just full ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to test this in a unit test, but this is clearly something missing. In the non-async path, only activeSpillSorter is used for sorting and spilling, and currently getMemoryUsage doesn't take account of it.

@viirya
Copy link
Member

viirya commented Mar 12, 2024

Oh, seems you need to check if it is null.

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2108.0 failed 1 times, most recent failure: Lost task 3.0 in stage 2108.0 (TID 22860) (d3cb52f725b1 executor driver): java.lang.NullPointerException: Cannot invoke "org.apache.spark.shuffle.sort.CometShuffleExternalSorter$SpillSorter.getMemoryUsage()" because "this.activeSpillSorter" is null

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.31%. Comparing base (6bedce4) to head (d026cfd).

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #196   +/-   ##
=========================================
  Coverage     33.30%   33.31%           
  Complexity      767      767           
=========================================
  Files           107      107           
  Lines         35372    35375    +3     
  Branches       7657     7658    +1     
=========================================
+ Hits          11782    11784    +2     
  Misses        21144    21144           
- Partials       2446     2447    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sunchao sunchao merged commit e920aa4 into apache:main Mar 13, 2024
26 checks passed
@sunchao
Copy link
Member Author

sunchao commented Mar 13, 2024

Thanks, merged

@sunchao sunchao deleted the shuffle-peak-memory branch March 13, 2024 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants