Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Add CI for TPCDS queries #99

Merged
merged 13 commits into from
Feb 24, 2024
Merged

build: Add CI for TPCDS queries #99

merged 13 commits into from
Feb 24, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 23, 2024

Which issue does this PR close?

Closes #101.

Rationale for this change

We need a CI pipeline that could help us verify Comet query correctness.

What changes are included in this PR?

This patch enables CometTPCDSQuerySuite in CI pipeline that runs TPCDS queries with Comet enabled and checks the results with Spark.

How are these changes tested?

@viirya viirya marked this pull request as draft February 23, 2024 08:15
@sunchao
Copy link
Member

sunchao commented Feb 23, 2024

Error: Failed to execute goal org.apache.rat:apache-rat-plugin:0.16:check (default) on project comet-parent-spark3.4_2.12: Too many files with unapproved license: 560 See RAT report in: /__w/arrow-datafusion-comet/arrow-datafusion-comet/target/rat.txt -> [Help 1]

I've seen this locally too. We probably should add it in here.

@viirya
Copy link
Member Author

viirya commented Feb 23, 2024

Yea, added it to exclude list.

@viirya viirya marked this pull request as ready for review February 23, 2024 21:45
@@ -26,7 +26,8 @@ import org.apache.comet.CometConf
class CometTPCDSQuerySuite
extends {
// This is private in `TPCDSBase`.
val excludedTpcdsQueries: Seq[String] = Seq("q34", "q64")
val excludedTpcdsQueries: Seq[String] =
Seq("q34", "q66", "q64", "q71", "q88", "q90", "q96")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird that we have more queries that are not same with Spark, compared with what we have internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out what changes are missed or messed during open sourcing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what are the error messages for these additional queries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the additional queries, looks like their outputs are different to Spark actually. We can see the difference in CI log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

- q96 *** FAILED ***
  java.lang.Exception: Expected "[0]", but got "[888]" Result did not match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- q90 *** FAILED ***
  java.lang.Exception: Expected "[NULL]", but got "[0.58933333333333333333]" Result did not match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- q88 *** FAILED ***
  java.lang.Exception: Expected "[0	0	0	0	0	0	0	0]", but got "[2358	4664	4828	7447	6997	3886	4073	4923]" Result did not match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- q71 *** FAILED ***
  java.lang.Exception: Expected "[]", but got "[10016002	corpamalgamalg #x                                 	19	7	23183.08
10005002	scholarunivamalg #x                               	8	8	17109.63
8014006	edu packmaxi #x                                   	9	53	16420.75
3003002	exportiexporti #x                                 	17	44	16337.23
3001001	amalgexporti #x                                   	17	46	16088.53
4001001	amalgedu pack #x                                  	9	28	15557.83
...

The actual (wrong) result is pretty long, I ignore it above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- q66 *** FAILED ***
  java.lang.Exception: Expected "[]", but got "[Just good amou	933435	Midway	Williamson County	TN	United States	DHL,BARIAN	2001	9246702.65	4973596.11	8243830.24	11536095.04	8848441.04	6589907.46	14946163.85	24260830.90	23156311.42	21897174.69	33046376.63	24591398.12	9.906102353136	5.328272573880	8.831713231238	
...

Same here.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunchao sunchao merged commit 38b0bfd into apache:main Feb 24, 2024
15 checks passed
@sunchao
Copy link
Member

sunchao commented Feb 24, 2024

Merged, thanks!

@viirya viirya mentioned this pull request Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CI for TPCDS queries
2 participants