Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CreateDisposition, withMethod, and schema for FirestoreToBigQuery template #1245

Merged
2 commits merged into from
Jan 4, 2024

Conversation

Abacn
Copy link
Contributor

@Abacn Abacn commented Jan 4, 2024

Internal bug: b/317890425

  • Support CreateDisposition

  • Support withMethod

  • Enable to assign a jsonSchema when CreateDisposition is CREATE_IF_NEEDED

Previously, the template has both

BigQueryIO.writeTableRows()
                .withoutValidation()
                .withCreateDisposition(CreateDisposition.CREATE_NEVER)
                ...
                .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)

which is CREATE_IF_NEEDED, but without providing table schema. This leads to template launch fail:

ERROR 2024-01-03T21:17:21.302075Z Exception in thread "main" ; see consecutive INFO logs for details.
INFO 2024-01-03T21:17:21.303735Z java.lang.IllegalArgumentException: CreateDisposition is CREATE_IF_NEEDED, however no schema was provided.
INFO 2024-01-03T21:17:21.303959Z at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
INFO 2024-01-03T21:17:21.304320Z at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:2207)
INFO 2024-01-03T21:17:21.304556Z at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:496)
INFO 2024-01-03T21:17:21.304657Z at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:365)
INFO 2024-01-03T21:17:21.305078Z at com.google.cloud.teleport.v2.templates.FirestoreToBigQuery.main(FirestoreToBigQuery.java:121)
INFO 2024-01-03T21:17:21.333350Z java failed with exit status 1

I believe this template has never worked. It was incomplete.

WIth this change it is able to run the template:

image

launch parameter:

gcloud dataflow flex-template run "firestore-test-`date +%Y%m%d-%H%M%S`" \
    --template-file-gcs-location "gs://*/2024-01-03-17-51-40_RC01/flex/Firestore_to_BigQuery_Flex" \
    --parameters outputTableSpec=google.com:clouddfe:yathu_test.firestore_test \
    --parameters bigQueryLoadingTemporaryDirectory=gs://*/temp \
    --parameters firestoreReadGqlQuery="select * from batch" \
    --parameters firestoreReadProjectId=* \
    --parameters bigQuerySchemaPath=gs://*/firestore_schema.json \
    --parameters useStorageWriteApi=true \
    --region "us-central1"

firestore_schema.json content:

{
  "fields": [
  {
    "name": "key",
    "type": "JSON"
  },
  {
    "name": "properties",
    "type": "JSON"
  }
  ]
}

Note that set useStorageWriteApi=false it gives the error of https://github.com/apache/beam/blob/fe1627db9472733fc42d6f027eb955ef55fece58/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L3740

However, verified that the temp file can be loaded via bq load as it seems FILE_LOADS now supports json data insertion. This is another Beam issue.

@Abacn Abacn force-pushed the fixfirestoretemplate branch from 58799c5 to 1c2df7a Compare January 4, 2024 00:06
@Abacn
Copy link
Contributor Author

Abacn commented Jan 4, 2024

R: @bvolpato

bvolpato
bvolpato previously approved these changes Jan 4, 2024
Copy link
Contributor

@bvolpato bvolpato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Yi!

@bvolpato
Copy link
Contributor

bvolpato commented Jan 4, 2024

Not a blocker, just a note that we'll need to work on an IT for this template, probably was missed

@Abacn Abacn added the Google LGTM Approval of a pull request to be merged into the repository label Jan 4, 2024
@copybara-service copybara-service bot closed this pull request by merging all changes into GoogleCloudPlatform:main in 5e673ac Jan 4, 2024
@Abacn Abacn deleted the fixfirestoretemplate branch January 4, 2024 21:22
@Abacn
Copy link
Contributor Author

Abacn commented Jan 4, 2024

There is still a caveat that Beam prevents FILE_LOAD of BigQuery table with JSON field. Created apache/beam#29923 fix on Beam side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Google LGTM Approval of a pull request to be merged into the repository size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants