[comet-parquet-exec] CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172

mbutrovich · 2024-12-16T18:17:56Z

Still confirming if there's a unit mismatch wrt time between Spark elapsed time and native elapsed time. Once I confirm that, I'll mark this okay for review.

…o CometNativeScanExec.

mbutrovich · 2024-12-16T21:53:16Z

As best as I can tell it is recording more metrics than that, but the UI cuts it off.

parthchandra · 2024-12-17T01:37:29Z

spark/src/main/scala/org/apache/spark/sql/comet/CometNativeScanExec.scala

+
+  override lazy val metrics: Map[String, SQLMetric] = {
+    CometMetricNode.baselineMetrics(sparkContext) ++
+      Map(


Should we have some way of distinguishing between these metrics and those from the current native scan? Perhaps the display string can have a short prefix?

Added a Native prefix. We may want to do this for all operators.

+1. We need not do this for all operators. This is just so we could distinguish between metrics reported by different scan implementations.

parthchandra · 2024-12-17T02:05:05Z

Is the time spent reading the footer actually zero?

mbutrovich · 2024-12-17T14:41:30Z

If I do an explain on the native side I see values that are sub-millisecond, which I'm not sure the Spark UI shows by default:

metrics=[output_rows=1, elapsed_compute=1ns, bytes_scanned=4744, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0, page_index_rows_matched=504, page_index_rows_pruned=1496, predicate_evaluation_errors=0, pushdown_rows_matched=505, pushdown_rows_pruned=503, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=49.043µs, metadata_load_time=506.667µs, page_index_eval_time=74.751µs, row_pushdown_eval_time=293.255µs, statistics_eval_time=116.584µs, time_elapsed_opening=835.042µs, time_elapsed_processing=2.705084ms, time_elapsed_scanning_total=2.349792ms, time_elapsed_scanning_until_data=2.247333ms]

The elapsed_compute number looks bogus though.

mbutrovich added 2 commits December 16, 2024 13:00

Add metrics from ParquetFileMetrics and FileStreamMetrics on native t…

3c34653

…o CometNativeScanExec.

Append to base metrics.

0680bc3

mbutrovich marked this pull request as ready for review December 17, 2024 00:19

parthchandra reviewed Dec 17, 2024

View reviewed changes

mbutrovich added 2 commits December 17, 2024 10:09

Remove elapsed_compute from NativeScan metrics.

b9170e5

Add Native prefix for relevant metrics.

d4f359f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[comet-parquet-exec] CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172

[comet-parquet-exec] CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172

mbutrovich commented Dec 16, 2024

mbutrovich commented Dec 16, 2024

parthchandra Dec 17, 2024

mbutrovich Dec 17, 2024

parthchandra Dec 17, 2024

parthchandra commented Dec 17, 2024

mbutrovich commented Dec 17, 2024

[comet-parquet-exec] CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172

Are you sure you want to change the base?

[comet-parquet-exec] CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics #1172

Conversation

mbutrovich commented Dec 16, 2024

mbutrovich commented Dec 16, 2024

parthchandra Dec 17, 2024

Choose a reason for hiding this comment

mbutrovich Dec 17, 2024

Choose a reason for hiding this comment

parthchandra Dec 17, 2024

Choose a reason for hiding this comment

parthchandra commented Dec 17, 2024

mbutrovich commented Dec 17, 2024