Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling Date type in HCatToRow #20685

Closed
damccorm opened this issue Jun 4, 2022 · 7 comments
Closed

handling Date type in HCatToRow #20685

damccorm opened this issue Jun 4, 2022 · 7 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

When I convert HCatRecord include Date type record to Row, it failed with the following errors.

  • the code
    PCollection<Row\> p =
        pipeline
            /*
             * Step #1: Read hive table rows from Hive.
             */
            .apply(
                "Read from Hive source",
                    HCatToRow.fromSpec(
                            HCatalogIO.read()
                                    .withConfigProperties(configProperties)
                                    .withDatabase(options.getHiveDatabaseName())
                                    .withTable(options.getHiveTableName())
                                    .withFilter(options.getFilterString())));
  • error log
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalArgumentException: For field name submissiondate and DATETIME type got unexpected class class java.sql.Date
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
        at com.google.cloud.teleport.v2.templates.HiveToBigQuery.run(HiveToBigQuery.java:234)
        at com.google.cloud.teleport.v2.templates.HiveToBigQuery.main(HiveToBigQuery.java:176)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: For field name submissiondate and DATETIME type got unexpected class class java.sql.Date
        at org.apache.beam.sdk.values.Row$Builder.verifyDateTime(Row.java:828)
        at org.apache.beam.sdk.values.Row$Builder.verifyPrimitiveType(Row.java:755)
        at org.apache.beam.sdk.values.Row$Builder.verify(Row.java:654)
        at org.apache.beam.sdk.values.Row$Builder.verify(Row.java:635)
        at org.apache.beam.sdk.values.Row$Builder.build(Row.java:840)
        at org.apache.beam.sdk.io.hcatalog.HCatToRow$HCatToRowFn.processElement(HCatToRow.java:84)

It occurs because HCatalogIO reads Date type as java.sql.Date in HCatRecord, but Row class doesn't support Date and HCatToRow doesn't care about it.

I think there are two solution about it.

  1. Row type supports Date type(java.util.Date or java.sql.Date)
    I don't know another IO classes enough, but there may be another IO classes which has same problem, and this solution may be able to solve those problem.

  2. Add logic to convert Date type to Datetime type in HCatToRow
    The impact of change will be smaller then 1. because it doesn't change Row class.

Which would be better?

Imported from Jira BEAM-10934. Original Jira may contain additional context.
Reported by: hayashidac.

@deadb0d4
Copy link
Contributor

deadb0d4 commented Oct 6, 2024

Hey @damccorm , do you think I could help out? --- I'm looking for a good first issue here!

@damccorm
Copy link
Contributor Author

damccorm commented Oct 7, 2024

Sure - to assign the issue to yourself, you can comment .take-issue

@deadb0d4
Copy link
Contributor

deadb0d4 commented Oct 7, 2024

.take-issue

deadb0d4 added a commit to deadb0d4/beam that referenced this issue Oct 7, 2024
Some initial notes:
- The issue (apache#20685) deals with java.sql.Date, which I wasn't able to
  reproduce fully (I can currently write hcatalog hadoop.hive date)
- On this note, 267f76f changed the
  code involved so that there's a direct cast to AbstractInstant in
  RowUtils.java. This doesn't change much, but jfyi.
@deadb0d4
Copy link
Contributor

deadb0d4 commented Oct 8, 2024

Hey @damccorm , fyi, I don't think I can set reviewers to the PR above, so it'd be great if you could take a look when you can!

@damccorm
Copy link
Contributor Author

damccorm commented Oct 8, 2024

Reviewers should get auto-assigned (looks like this happened in your PR, so you should be set 🙂) - thanks!

Abacn pushed a commit that referenced this issue Oct 9, 2024
* Handle Date type in HCatToRow

Some initial notes:
- The issue (#20685) deals with java.sql.Date, which I wasn't able to
  reproduce fully (I can currently write hcatalog hadoop.hive date)
- On this note, 267f76f changed the
  code involved so that there's a direct cast to AbstractInstant in
  RowUtils.java. This doesn't change much, but jfyi.

* Run: ./gradlew :sdks:java:io:hcatalog:spotlessApply

* review cr: castTypes util

- s/castHDate/maybeCastHDate/ to be more concise
- move values manipulation to a separate util (hopefully, I understood
  the cr in the right way)
@deadb0d4
Copy link
Contributor

deadb0d4 commented Oct 9, 2024

Hey @damccorm , #32695 just got merged, and I forgot to make it close the issue automatically (s/addresses/fixes or something?). Do you think we should close the issue manually now?

@damccorm
Copy link
Contributor Author

damccorm commented Oct 9, 2024

That's great, thanks! I'll close this one, but you can also do it by commenting .close-issue

@damccorm damccorm closed this as completed Oct 9, 2024
@github-actions github-actions bot added this to the 2.61.0 Release milestone Oct 9, 2024
reeba212 pushed a commit to reeba212/beam that referenced this issue Dec 4, 2024
* Handle Date type in HCatToRow

Some initial notes:
- The issue (apache#20685) deals with java.sql.Date, which I wasn't able to
  reproduce fully (I can currently write hcatalog hadoop.hive date)
- On this note, 267f76f changed the
  code involved so that there's a direct cast to AbstractInstant in
  RowUtils.java. This doesn't change much, but jfyi.

* Run: ./gradlew :sdks:java:io:hcatalog:spotlessApply

* review cr: castTypes util

- s/castHDate/maybeCastHDate/ to be more concise
- move values manipulation to a separate util (hopefully, I understood
  the cr in the right way)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants