Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: newFileScanRDD should not take constructor from custom Spark versions #412

Merged
merged 3 commits into from
May 18, 2024

Conversation

ceppelli
Copy link
Contributor

Which issue does this PR close?

Closes #411 .

Rationale for this change

the file spark-sql_2.12-3.4.1-amzn-2.jar is a custom version of spark and contains the class org.apache.spark.sql.execution.datasources.FileScanRDD with 2 constructs, one with 6 parameters and the second with 8 parameters. The suggested workaround filters out the custom constructor.

What changes are included in this PR?

How are these changes tested?

workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation
@viirya viirya changed the title [FIX] - workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation fix:workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation May 10, 2024
@viirya viirya changed the title fix:workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation May 10, 2024
@viirya viirya changed the title fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation fix: newFileScanRDD should not take constructor from custom Spark versions May 10, 2024
@viirya
Copy link
Member

viirya commented May 18, 2024

I take the liberty to commit some suggestions on code comment and style as it is not responded for days. I will merge this once CI passes.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 34.02%. Comparing base (14494d3) to head (bd24b31).
Report is 19 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #412      +/-   ##
============================================
- Coverage     34.02%   34.02%   -0.01%     
- Complexity      857      858       +1     
============================================
  Files           116      116              
  Lines         38565    38583      +18     
  Branches       8517     8521       +4     
============================================
+ Hits          13120    13126       +6     
- Misses        22691    22702      +11     
- Partials       2754     2755       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@viirya viirya merged commit 1f23c18 into apache:main May 18, 2024
40 checks passed
@viirya
Copy link
Member

viirya commented May 18, 2024

Merged. Thanks @ceppelli @kazuyukitanimura @andygrove

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…sions (apache#412)

* [FIX] - workaround for aws emr spark 3.4

workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation

* Update spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala

* Update spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 1f23c18)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compatibility issue with AWS EMR 6.15.0 SPARK 3.4.1
5 participants