-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Add spark-4.0 profile and shims #407
Conversation
@viirya @andygrove Please approve to start CI |
Triggered. |
@viirya @andygrove Is there a way to start CI without bothering you? |
I remember only the first-time contributors need approval to trigger CI. |
ae59e5b
to
d629df1
Compare
@viirya @andygrove passed all the tests on my personal github actions |
Triggered CI pipelines. |
common/src/main/spark-4.0/org/apache/comet/shims/ShimFileFormat.scala
Outdated
Show resolved
Hide resolved
common/src/main/spark-4.0/org/apache/spark/sql/comet/shims/ShimCometParquetUtils.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/spark/sql/comet/DecimalPrecision.scala
Outdated
Show resolved
Hide resolved
...k/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala
Outdated
Show resolved
Hide resolved
override def write( | ||
rdd: RDD[_], | ||
inputs: Iterator[_], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input iterator has partitionId, we still need to get rid of it. I don't run this but looks like this will cause failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have ShimCometShuffleWriteProcessor.write()
for Spark 3.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I mean for Spark 4.0 case, we also need to get rid of the partition. I saw you replied my other comment: #407 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the partitionId removal.
Thanks @viirya
...k/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala
Outdated
Show resolved
Hide resolved
@viirya Please take another look |
.github/workflows/pr_build.yml
Outdated
matrix: | ||
os: [ubuntu-latest] | ||
java_version: [17] | ||
test-target: [rust, java] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to run rust test for Spark 4.0 separately? It should be no difference with 3.4 or 3.3/3.2.
We also don't run rust test for Spark 3.3/3.2 but only for Spark 3.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
.github/workflows/pr_build.yml
Outdated
- if: matrix.test-target == 'java' | ||
name: Install Spark | ||
shell: bash | ||
working-directory: ./apache-spark | ||
run: build/mvn install -Phive -Phadoop-cloud -DskipTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only needed for Spark 4.0? I don't see we install it for other Spark versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, This is only needed for Spark 4.0 because there is no 4.0.0-SNAPSHOT
jar publicly available
/** | ||
* Returns a tuple of expressions for the `unhex` function. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* Returns a tuple of expressions for the `unhex` function. | |
*/ | |
/** | |
* Returns a tuple of expressions for the `unhex` function. | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -29,7 +29,7 @@ import org.apache.comet.CometConf | |||
/** | |||
* This test suite contains tests for only Spark 3.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
* This test suite contains tests for only Spark 3.4. | |
* This test suite contains tests for only Spark 3.4+. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I have some minor comments.
@kazuyukitanimura I think you can try to merge this. |
Thank you @viirya merged |
This PR adds the spark-4.0 profile and shims This is an initial commit. Tests with the spark-4.0 profile do not pass yet. Tests for spark-3.x should pass. (cherry picked from commit 9b3e87b)
Which issue does this PR close?
Part of #372
Rationale for this change
To be ready for Spark 4.0
What changes are included in this PR?
This PR adds the spark-4.0 profile and shims
How are these changes tested?
This is an initial commit. Tests with the spark-4.0 profile do not pass yet. Tests for spark-3.x should pass.