You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to export the entire table of Bigtable via BigtableIO.Read. But there are some rows that exceeds the limit of 256MB, which leads to failure. It seems that Bigtable server refused to return the row that exceeds 256MB.
But I don't know how to apply this method with BigtableIO.Read, considering I want to export all the data of table. I don't know how to implement dynamic paginate by cell in one pipeline.
I would like to know if BigtableIO.Read currently has the capability to meet the requirements of my scenario. If it cannot, are there any alternative solutions that can help me elegantly export all the data?
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
You would have to loop through cells offset and then Flatten those readTransformCellN
You will end up with elements <rowKey, Row with 1 cell> where multiple elements share same rowKey. Then it's up to you if you want to do group by key and build back large rows.
If you have unknown number of large cells then I would recommend building a pipeline that will fetch row keys only, redistribute and write parDo to fetch cells with similar filters in a loop using regular bigtable client.
What happened?
I want to export the entire table of Bigtable via BigtableIO.Read. But there are some rows that exceeds the limit of 256MB, which leads to failure. It seems that Bigtable server refused to return the row that exceeds 256MB.
I found some infomation in Bigtable Documents, which suggests me to use paginate my request and use a cells per row limit filter and a cells per row offset filter.
But I don't know how to apply this method with BigtableIO.Read, considering I want to export all the data of table. I don't know how to implement dynamic paginate by cell in one pipeline.
I would like to know if BigtableIO.Read currently has the capability to meet the requirements of my scenario. If it cannot, are there any alternative solutions that can help me elegantly export all the data?
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: