-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add additional metrics for shuffle write #1173
Conversation
docs/source/user-guide/metrics.md
Outdated
|
||
### CometScanExec | ||
|
||
`CometScanExec` uses nanoseconds for total scan time. Spark also measures scan time in nanoseconds but converts to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a problem statement. Did you mean that spark.comet.metrics.detailed=true
will not loose the precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik, the conversion happens only when the data is to be displayed in the UI. (https://github.com/apache/spark/blob/576caec1da85c4451fe63e2a5923f2dbf136e278/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L248)
But this is what Spark does with all its nanosecond timing metrics, so we aren't doing anything different here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have updated this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark converts nanos to millis on each batch:
override def hasNext: Boolean = {
// The `FileScanRDD` returns an iterator which scans the file during the `hasNext` call.
val startNs = System.nanoTime()
val res = batches.hasNext
scanTime += NANOSECONDS.toMillis(System.nanoTime() - startNs)
res
}
We just use the nano time:
override def hasNext: Boolean = {
// The `FileScanRDD` returns an iterator which scans the file during the `hasNext` call.
val startNs = System.nanoTime()
val res = batches.hasNext
scanTime += System.nanoTime() - startNs
res
}
It actually makes a large difference to the time reported in some cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the description in the metrics guide to explain this in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That could be a big difference in small datasets (wonder if the same occurs when we have large files). Either way, better to not lose precision. We are not likely to run into overflow issues are we?
Seeing some segmentation faults in CI:
|
I may have found a bug in DataFusion
|
I created a new simpler PR to replace this one: #1175 |
Which issue does this PR close?
N/A
Rationale for this change
I would like to understand how much time is spent on shuffle writing.
What changes are included in this PR?
write_time
native metric to record write time instead of usingelapsed_time
elapsed_time
to measure total native time (excluding executing the child plan and fetching data)input_time
native metric for measuring the time forShuffleWriterExec
to execute the child plan and fetch its input datashuffleWallTime
JVM metric to measure total time of shuffleSpark UI
Note the new metrics:
Native plan
How are these changes tested?