Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline-connector][paimon] add paimon pipeline data sink connector. #2916

Merged
merged 9 commits into from
Apr 24, 2024

Conversation

lvyanquan
Copy link
Contributor

@lvyanquan lvyanquan commented Dec 22, 2023

This close #2856.
Some codes are inspired by FlinkCdcMultiTableSink in Paimon repo, and add a sinkV2 implement.

@github-actions github-actions bot added the docs Improvements or additions to documentation label Dec 22, 2023
@lvyanquan lvyanquan force-pushed the pipeline-paimon branch 2 times, most recently from f8d62e5 to 40517a7 Compare December 24, 2023 07:41
@lvyanquan lvyanquan changed the title [WIP][pipeline-connector][paimon] add paimon pipeline data sink connector. [pipeline-connector][paimon] add paimon pipeline data sink connector. Dec 25, 2023
@lvyanquan lvyanquan force-pushed the pipeline-paimon branch 4 times, most recently from 1c9357e to bb75217 Compare December 25, 2023 01:58
@lvyanquan
Copy link
Contributor Author

@PatrickRen PTAL.

@lvyanquan lvyanquan force-pushed the pipeline-paimon branch 3 times, most recently from 4208998 to 2e7fe85 Compare January 26, 2024 11:14
@yanghuaiGit
Copy link
Contributor

yanghuaiGit commented Jan 29, 2024

image
paimon在schemaschemachangevent事件产生时会从catalog里加载最新的schema,这个时候schema有可能没被修改,导致写入的数据还是修改ddl之前的schema字段数据,新的字段数据读不出来或者删除字段后出现新的问题
image
是否可以在releasestream之后发送schemaChangeEvebt事件,这样下游获取schema就一定是最新的

@lvyanquan
Copy link
Contributor Author

Thanks @yanghuaiGit for pointing out this, address it.

@yanghuaiGit
Copy link
Contributor

image
com.ververica.cdc.connectors.paimon.sink.PaimonMetadataApplier 静态字段catalog,在反序列化之后,获取的对象里catalog为null,导致com.ververica.cdc.connectors.paimon.sink.PaimonMetadataApplier#applySchemaChange方法执行时为空指针。

catalog应改为
private transient Catalog catalog;,在applySchemaChange时判断是否为空来构建一个catalog
image

@lvyanquan
Copy link
Contributor Author

address it.

@melin
Copy link

melin commented Mar 13, 2024

Support reading data from multiple table messages written to the same topic?

#2938 (comment)

@lvyanquan
Copy link
Contributor Author

Can multiple table cdc message be written to the same topic?

You can do this by using route in pipeline.

@yanghuaiGit
Copy link
Contributor

paimon latest version is 0.7,we should update paimon version from 0.6 to 0.7

@PatrickRen
Copy link
Contributor

paimon latest version is 0.7,we should update paimon version from 0.6 to 0.7

@lvyanquan Could you take a look at this one? I prefer to catch up with the latest version as well. Also could you rebase the latest master? Thanks

@lvyanquan
Copy link
Contributor Author

paimon latest version is 0.7,we should update paimon version from 0.6 to 0.7

Done and rebased to master.

Copy link
Contributor

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @lvyanquan's great work! I left some comments & doubts here, please kindly check.

@melin
Copy link

melin commented Apr 10, 2024

Does kafka header set constant values? For example, if data from multiple data centers is written to the same kafka topic, add a region key to the kafka header.

@lvyanquan
Copy link
Contributor Author

Thanks @yuxiqian for those comments, I've addressed it and resubmitted.

Copy link
Contributor

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @lvyanquan's kindly clarification!

Copy link
Contributor

@PatrickRen PatrickRen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvyanquan Thanks for the PR! LGTM.

@PatrickRen PatrickRen merged commit ef2eece into apache:master Apr 24, 2024
15 checks passed
wuzhenhua01 pushed a commit to wuzhenhua01/flink-cdc-connectors that referenced this pull request Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[3.1][pipeline-connectors] Add Implementation of DataSink in Paimon.
5 participants