[Bug]: BigtableIO.Read can't retrive large row that exceeds 256MB #33039

waterxjw · 2024-11-07T12:31:16Z

What happened?

I want to export the entire table of Bigtable via BigtableIO.Read. But there are some rows that exceeds the limit of 256MB, which leads to failure. It seems that Bigtable server refused to return the row that exceeds 256MB.

I found some infomation in Bigtable Documents, which suggests me to use paginate my request and use a cells per row limit filter and a cells per row offset filter.

But I don't know how to apply this method with BigtableIO.Read, considering I want to export all the data of table. I don't know how to implement dynamic paginate by cell in one pipeline.

I would like to know if BigtableIO.Read currently has the capability to meet the requirements of my scenario. If it cannot, are there any alternative solutions that can help me elegantly export all the data?

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

github-actions · 2024-11-07T12:35:04Z

Label Bigtable cannot be managed because it does not exist in the repo. Please check your spelling.

waterxjw · 2024-11-07T12:36:29Z

.add-labels io,gcp,bigtable

liferoad · 2024-11-07T17:51:09Z

cc @mutianf @igorbernstein2

CoderUper · 2024-11-12T08:12:43Z

Can you fixed it now? I aslo meet this problem. @waterxjw

waterxjw · 2024-11-12T08:32:31Z

Can you fixed it now? I aslo meet this problem. @waterxjw

Not yet

stankiewicz · 2024-12-12T11:14:01Z

hi,
I assume you have known and limited to reasonable value amount of cells.
To add mentioned filters you need to build a chain:

RowFilter chain = RowFilter.newBuilder().setChain(RowFilter.Chain.newBuilder()
            .addFilters(RowFilter.newBuilder().setCellsPerRowLimitFilter(1).build())
            .addFilters(RowFilter.newBuilder().setCellsPerRowOffsetFilter(1).build())
            .build()).build();
readTransformCell1 = BigtableIO.read().withInstanceId("instance").withProjectId("project").withRowFilter(chain);

You would have to loop through cells offset and then Flatten those readTransformCellN
You will end up with elements <rowKey, Row with 1 cell> where multiple elements share same rowKey. Then it's up to you if you want to do group by key and build back large rows.

If you have unknown number of large cells then I would recommend building a pipeline that will fetch row keys only, redistribute and write parDo to fetch cells with similar filters in a loop using regular bigtable client.

waterxjw added awaiting triage bug labels Nov 7, 2024

github-actions bot added java P2 labels Nov 7, 2024

github-actions bot added bigtable gcp io labels Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: BigtableIO.Read can't retrive large row that exceeds 256MB #33039

[Bug]: BigtableIO.Read can't retrive large row that exceeds 256MB #33039

waterxjw commented Nov 7, 2024

github-actions bot commented Nov 7, 2024

waterxjw commented Nov 7, 2024

liferoad commented Nov 7, 2024

CoderUper commented Nov 12, 2024

waterxjw commented Nov 12, 2024

stankiewicz commented Dec 12, 2024 •

edited

Loading

[Bug]: BigtableIO.Read can't retrive large row that exceeds 256MB #33039

[Bug]: BigtableIO.Read can't retrive large row that exceeds 256MB #33039

Comments

waterxjw commented Nov 7, 2024

What happened?

Issue Priority

Issue Components

github-actions bot commented Nov 7, 2024

waterxjw commented Nov 7, 2024

liferoad commented Nov 7, 2024

CoderUper commented Nov 12, 2024

waterxjw commented Nov 12, 2024

stankiewicz commented Dec 12, 2024 • edited Loading

stankiewicz commented Dec 12, 2024 •

edited

Loading