-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BigQueryIO] fetch updated schema for newly created Storage API stream writers #33231
base: master
Are you sure you want to change the base?
Changes from all commits
6bbbabf
ad879e6
cf8b5aa
7d455f9
5159a61
21e0585
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ | |
import com.google.cloud.bigquery.storage.v1.Exceptions.StreamFinalizedException; | ||
import com.google.cloud.bigquery.storage.v1.ProtoRows; | ||
import com.google.cloud.bigquery.storage.v1.TableSchema; | ||
import com.google.cloud.bigquery.storage.v1.WriteStream; | ||
import com.google.cloud.bigquery.storage.v1.WriteStream.Type; | ||
import com.google.protobuf.ByteString; | ||
import com.google.protobuf.DescriptorProtos; | ||
|
@@ -531,6 +532,30 @@ public void process( | |
element.getKey().getKey(), dynamicDestinations, datasetService); | ||
tableSchema = converter.getTableSchema(); | ||
descriptor = converter.getDescriptor(false); | ||
|
||
if (autoUpdateSchema) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this is the ideal place to put this. getAppendClientInfo is called whenever the static cache is populated, meaning that on any worker restart, range move, etc. we'll be forced to call this API again. However we have persistent state in this DoFn, so we know if it's a "new" key or not. Can we use that to gate calling this method instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should always perform this check before creating a new StreamWriter , regardless of the reason for its creation. The only exception is if we already have an updated schema stored in state (see first There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also note that the updated schema is ignored when the StreamWriter object's creation time is later than the updated schema's. |
||
// A StreamWriter ignores table schema updates that happen prior to its creation. | ||
// So before creating a StreamWriter below, we fetch the table schema to check if we | ||
// missed an update. | ||
// If so, use the new schema instead of the base schema | ||
@Nullable | ||
WriteStream writeStream = | ||
writeStreamService.getWriteStream(getOrCreateStream.get()); | ||
TableSchema streamSchema = | ||
writeStream == null | ||
? TableSchema.getDefaultInstance() | ||
: writeStream.getTableSchema(); | ||
Optional<TableSchema> newSchema = | ||
TableSchemaUpdateUtils.getUpdatedSchema(tableSchema, streamSchema); | ||
|
||
if (newSchema.isPresent()) { | ||
tableSchema = newSchema.get(); | ||
descriptor = | ||
TableRowToStorageApiProto.descriptorSchemaFromTableSchema( | ||
tableSchema, true, false); | ||
updatedSchema.write(tableSchema); | ||
} | ||
} | ||
} | ||
AppendClientInfo info = | ||
AppendClientInfo.of( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a boolean parameter to the method, so we only return schema if requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI in our code base we always perform this call for the sole purpose of fetching the schema