From 6368d3445881ca09ca5dcc4bec4b2e6998f40f49 Mon Sep 17 00:00:00 2001 From: lidezhu Date: Fri, 17 May 2024 16:30:34 +0800 Subject: [PATCH 01/14] add description for cdc behaviour change --- ticdc/ticdc-behavior-change.md | 41 ++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 887ff14aa49f4..1af06319b6adf 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -52,3 +52,44 @@ COMMIT; In this example, by executing three SQL statements to swap the primary keys of two rows, TiCDC only receives two update change events, that is, changing the primary key `a` from `1` to `2` and changing the primary key `a` from `2` to `1`. If the MySQL sink directly writes these two update events to the downstream, a primary key conflict might occur, leading to changefeed errors. Therefore, TiCDC splits these two events into four events, that is, deleting records `(1, 1)` and `(2, 2)` and writing records `(2, 1)` and `(1, 2)`. + +Starting from v8.1.0, when using MySQL Sink, TiCDC will fetch a current timestamp (recorded as `thresholdTs`) from PD at start. For update events with `commitTS` less than `thresholdTs`, it will be split into delete and insert events before being written into the Sorter. This can ensure that all events within the same transaction are sorted in the order in which the delete event precedes the insert event. For update events with `commitTS` greater than or equal to `thresholdTs`, TiCDC will not split them. For details, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). + +This change is due to the fact that TiCDC cannot obtain the execution order between multiple update events within the same upstream transaction. For a transaction containing multiple update events, if the primary key or non-null unique index value is modified in the update event, and the events are split into delete events and insert events before sent to the downstream, this may cause data inconsistency problem. + +Take the following SQL as an example: + +```sql +CREATE TABLE t (a INT PRIMARY KEY, b INT); +INSERT INTO t VALUES (1, 1); +INSERT INTO t VALUES (2, 2); + +BEGIN; +UPDATE t SET a = 3 WHERE a = 2; +UPDATE t SET a = 2 WHERE a = 1; +COMMIT; +``` + +In the above example, the execution order of the two SQL statements within the transaction has a sequential dependency relationship, that is, the primary key `a` is changed from `2` to `3`, and then the primary key `a` is changed from `1` to `2`. If the order of update events received by TiCDC is different from the actual execution order within the transaction, splitting them into delete and insert events and sending them downstream will cause data inconsistency. + +For example, the sequence of update events that TiCDC may receive is as follows: + +```sql +UPDATE t SET a = 2 WHERE a = 1; +UPDATE t SET a = 3 WHERE a = 2; +``` + +The actual sequence of events executed in downstream after TiCDC splits the above update events is as follows: + +```sql +BEGIN; +DELETE FROM t WHERE a = 1; +REPLACE INTO t VALUES (2, 1); +DELETE FROM t WHERE a = 2; +REPLACE INTO t VALUES (3, 2); +COMMIT; +``` + +After executing the transaction in the upstream, the records should be `(3, 2)` and `(2, 2)`, while the records in the downstream will be `(3, 2)` after executing the transaction, which means data inconsistency problem happens. + +Note that after this behavior change, TiCDC will not split update events in most cases when using MySQL Sink, so primary key or unique key conflicts may occur when changefeed is run. This problem will cause the changefeed to automatically restart. After the restart, the conflicting update events will be split into two events, delete and insert, and written to the Sorter. At this time, it can be ensured that all events in the same transaction are in the order of the delete event before the insert event. Sort to complete data synchronization correctly. From 2ccfbfed4b2db0c7930e0aa1f94ecb94a9b8c06d Mon Sep 17 00:00:00 2001 From: lidezhu Date: Fri, 17 May 2024 16:32:34 +0800 Subject: [PATCH 02/14] fix typo --- ticdc/ticdc-behavior-change.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 1af06319b6adf..53f18495b7f63 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -92,4 +92,4 @@ COMMIT; After executing the transaction in the upstream, the records should be `(3, 2)` and `(2, 2)`, while the records in the downstream will be `(3, 2)` after executing the transaction, which means data inconsistency problem happens. -Note that after this behavior change, TiCDC will not split update events in most cases when using MySQL Sink, so primary key or unique key conflicts may occur when changefeed is run. This problem will cause the changefeed to automatically restart. After the restart, the conflicting update events will be split into two events, delete and insert, and written to the Sorter. At this time, it can be ensured that all events in the same transaction are in the order of the delete event before the insert event. Sort to complete data synchronization correctly. +Note that after this behavior change, TiCDC will not split update events in most cases when using MySQL Sink, so primary key or unique key conflicts may occur when changefeed is run. This problem will cause the changefeed to automatically restart. After the restart, the conflicting update events will be split into delete and insert events and written to the Sorter. At this time, it can be ensured that all events in the same transaction are in the order of the delete event before the insert event, which can guarantee data synchronization to process correctly. From 2341f050bc3de4a681a7802515704574d1710286 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 28 May 2024 18:36:56 +0800 Subject: [PATCH 03/14] sync from Chinese changes --- ticdc/ticdc-behavior-change.md | 66 ++++++++++++++++++++++------------ 1 file changed, 44 insertions(+), 22 deletions(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 53f18495b7f63..02c7d42040cbc 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -5,11 +5,11 @@ summary: Introduce the behavior changes of TiCDC changefeed, including the reaso # TiCDC Behavior Changes -## Split update events into delete and insert events +## Split `UPDATE` events into `DELETE` and `INSERT` events ### Transactions containing a single update change -Starting from v6.5.3, v7.1.1, and v7.2.0, when using a non-MySQL sink, for transactions that only contain a single update change, if the primary key or non-null unique index value is modified in the update event, TiCDC splits this event into delete and insert events. For more information, see GitHub issue [#9086](https://github.com/pingcap/tiflow/issues/9086). +Starting from v6.5.3, v7.1.1, and v7.2.0, when using a non-MySQL sink, for transactions that only contain a single update change, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. For more information, see GitHub issue [#9086](https://github.com/pingcap/tiflow/issues/9086). This change primarily addresses the following issues: @@ -24,14 +24,14 @@ INSERT INTO t VALUES (1, 1); UPDATE t SET a = 2 WHERE a = 1; ``` -In this example, the primary key `a` is updated from `1` to `2`. If the update event is not split: +In this example, the primary key `a` is updated from `1` to `2`. If the `UPDATE` event is not split: * When using the CSV and AVRO protocols, the consumer only obtains the new value `a = 2` and cannot obtain the old value `a = 1`. This might cause the downstream consumer to only insert the new value `2` without deleting the old value `1`. -* When using the index value dispatcher, the event for inserting `(1, 1)` might be sent to Partition 0, and the update event `(2, 1)` might be sent to Partition 1. If the consumption progress of Partition 1 is faster than that of Partition 0, an error might occur due to the absence of corresponding data in the downstream. Therefore, TiCDC splits the update event into delete and insert events. The event for deleting `(1, 1)` is sent to Partition 0, and the event for writing `(2, 1)` is sent to Partition 1, ensuring that the events are consumed successfully regardless of the progress of the consumer. +* When using the index value dispatcher, the event for inserting `(1, 1)` might be sent to Partition 0, and the `UPDATE` event `(2, 1)` might be sent to Partition 1. If the consumption progress of Partition 1 is faster than that of Partition 0, an error might occur due to the absence of corresponding data in the downstream. Therefore, TiCDC splits the `UPDATE` event into `DELETE` and `INSERT` events. The event for deleting `(1, 1)` is sent to Partition 0, and the event for writing `(2, 1)` is sent to Partition 1, ensuring that the events are consumed successfully regardless of the progress of the consumer. ### Transactions containing multiple update changes -Starting from v6.5.4, v7.1.2, and v7.4.0, for transactions containing multiple changes, if the primary key or non-null unique index value is modified in the update event, TiCDC splits the event into delete and insert events and ensures that all events follow the sequence of delete events preceding insert events. For more information, see GitHub issue [#9430](https://github.com/pingcap/tiflow/issues/9430). +Starting from v6.5.4, v7.1.2, and v7.4.0, for transactions containing multiple changes, if the primary key or non-null unique index value is modified in the `UPDATE` event, TiCDC splits the event into `DELETE` and `INSERT` events and ensures that all events follow the sequence of `DELETE` events preceding `INSERT` events. For more information, see GitHub issue [#9430](https://github.com/pingcap/tiflow/issues/9430). This change primarily addresses the potential issue of primary key conflicts when using the MySQL sink to directly write these two events to the downstream, leading to changefeed errors. @@ -49,15 +49,20 @@ UPDATE t SET a = 3 WHERE a = 2; COMMIT; ``` -In this example, by executing three SQL statements to swap the primary keys of two rows, TiCDC only receives two update change events, that is, changing the primary key `a` from `1` to `2` and changing the primary key `a` from `2` to `1`. If the MySQL sink directly writes these two update events to the downstream, a primary key conflict might occur, leading to changefeed errors. +In this example, by executing three SQL statements to swap the primary keys of two rows, TiCDC only receives two update change events, that is, changing the primary key `a` from `1` to `2` and changing the primary key `a` from `2` to `1`. If the MySQL sink directly writes these two `UPDATE` events to the downstream, a primary key conflict might occur, leading to changefeed errors. Therefore, TiCDC splits these two events into four events, that is, deleting records `(1, 1)` and `(2, 2)` and writing records `(2, 1)` and `(1, 2)`. -Starting from v8.1.0, when using MySQL Sink, TiCDC will fetch a current timestamp (recorded as `thresholdTs`) from PD at start. For update events with `commitTS` less than `thresholdTs`, it will be split into delete and insert events before being written into the Sorter. This can ensure that all events within the same transaction are sorted in the order in which the delete event precedes the insert event. For update events with `commitTS` greater than or equal to `thresholdTs`, TiCDC will not split them. For details, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). +#### MySQL Sink -This change is due to the fact that TiCDC cannot obtain the execution order between multiple update events within the same upstream transaction. For a transaction containing multiple update events, if the primary key or non-null unique index value is modified in the update event, and the events are split into delete events and insert events before sent to the downstream, this may cause data inconsistency problem. +Starting from v8.1.0, when using the MySQL Sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: -Take the following SQL as an example: +- For transactions containing multiple changes, if the primary key or non-null unique index value is modified in `UPDATE` events and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits each `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. +- For `UPDATE` events with the transaction `commitTS` greater than or equal to `thresholdTs`, TiCDC does not split them. For more information, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). + +This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. + +Take the following SQL statements as an example: ```sql CREATE TABLE t (a INT PRIMARY KEY, b INT); @@ -70,26 +75,43 @@ UPDATE t SET a = 2 WHERE a = 1; COMMIT; ``` -In the above example, the execution order of the two SQL statements within the transaction has a sequential dependency relationship, that is, the primary key `a` is changed from `2` to `3`, and then the primary key `a` is changed from `1` to `2`. If the order of update events received by TiCDC is different from the actual execution order within the transaction, splitting them into delete and insert events and sending them downstream will cause data inconsistency. +In this example, the two `UPDATE` statements within the transaction have a sequential dependency on execution. The primary key `a` is changed from `2` to `3`, and then the primary key `a` is changed from `1` to `2`. After this transaction is executed, the records in the upstream database are `(2, 1)` and `(3, 2)`. -For example, the sequence of update events that TiCDC may receive is as follows: +However, the order of `UPDATE` events received by TiCDC might differ from the actual execution order of the upstream transaction. For example: ```sql UPDATE t SET a = 2 WHERE a = 1; UPDATE t SET a = 3 WHERE a = 2; ``` -The actual sequence of events executed in downstream after TiCDC splits the above update events is as follows: +- Before this behavior change, TiCDC writes these `UPDATE` events to the Sorter module and then splits them into `DELETE` and `INSERT` events. After the split, the actual execution order of these events in the downstream is as follows: -```sql -BEGIN; -DELETE FROM t WHERE a = 1; -REPLACE INTO t VALUES (2, 1); -DELETE FROM t WHERE a = 2; -REPLACE INTO t VALUES (3, 2); -COMMIT; -``` + ```sql + BEGIN; + DELETE FROM t WHERE a = 1; + REPLACE INTO t VALUES (2, 1); + DELETE FROM t WHERE a = 2; + REPLACE INTO t VALUES (3, 2); + COMMIT; + ``` + + After the downstream executes the transaction, the records in the database are `(3, 2)`, which are different from the records in the upstream database (`(2, 1)` and `(3, 2)`), indicating a data inconsistency issue. + +- After this behavior change, if the transaction `commitTS` is less than the `thresholdTs` obtained by TiCDC at startup, TiCDC splits these `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. After the sorting by the Sorter module, the actual execution order of these events in the downstream is as follows: + + ```sql + BEGIN; + DELETE FROM t WHERE a = 1; + DELETE FROM t WHERE a = 2; + REPLACE INTO t VALUES (2, 1); + REPLACE INTO t VALUES (3, 2); + COMMIT; + ``` + + After the downstream executes the transaction, the records in the downstream database are the same as those in the upstream database, which are `(2, 1)` and `(3, 2)`, ensuring data consistency. -After executing the transaction in the upstream, the records should be `(3, 2)` and `(2, 2)`, while the records in the downstream will be `(3, 2)` after executing the transaction, which means data inconsistency problem happens. +As you can see from the preceding example, splitting the `UPDATE` event into `DELETE` and `INSERT` events before writing them to the Sorter module ensures that all `DELETE` events are executed before `INSERT` events after the split, thereby maintaining data consistency regardless of the order of `UPDATE` events received by TiCDC. -Note that after this behavior change, TiCDC will not split update events in most cases when using MySQL Sink, so primary key or unique key conflicts may occur when changefeed is run. This problem will cause the changefeed to automatically restart. After the restart, the conflicting update events will be split into delete and insert events and written to the Sorter. At this time, it can be ensured that all events in the same transaction are in the order of the delete event before the insert event, which can guarantee data synchronization to process correctly. +> **Notes:** +> +> After this behavior change, when using MySQL Sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file From ab0a899a469439c2bdcccf0fc529f195d18bd342 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 10:22:47 +0800 Subject: [PATCH 04/14] MySQL Sink -> MySQL sink --- ticdc/ticdc-behavior-change.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 02c7d42040cbc..63c18a1c72b4c 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -53,9 +53,9 @@ In this example, by executing three SQL statements to swap the primary keys of t Therefore, TiCDC splits these two events into four events, that is, deleting records `(1, 1)` and `(2, 2)` and writing records `(2, 1)` and `(1, 2)`. -#### MySQL Sink +#### MySQL sink -Starting from v8.1.0, when using the MySQL Sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: +Starting from v8.1.0, when using the MySQL sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: - For transactions containing multiple changes, if the primary key or non-null unique index value is modified in `UPDATE` events and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits each `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. - For `UPDATE` events with the transaction `commitTS` greater than or equal to `thresholdTs`, TiCDC does not split them. For more information, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). @@ -114,4 +114,4 @@ As you can see from the preceding example, splitting the `UPDATE` event into `DE > **Notes:** > -> After this behavior change, when using MySQL Sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file +> After this behavior change, when using MySQL sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file From 699910ed6711ff1b5b3c2e41858dc986a650ddf0 Mon Sep 17 00:00:00 2001 From: lidezhu <47731263+lidezhu@users.noreply.github.com> Date: Wed, 29 May 2024 10:38:53 +0800 Subject: [PATCH 05/14] Update ticdc/ticdc-behavior-change.md --- ticdc/ticdc-behavior-change.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 63c18a1c72b4c..09b5693d964bc 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -57,7 +57,7 @@ Therefore, TiCDC splits these two events into four events, that is, deleting rec Starting from v8.1.0, when using the MySQL sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: -- For transactions containing multiple changes, if the primary key or non-null unique index value is modified in `UPDATE` events and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits each `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. +- If the primary key or non-null unique index value is modified in `UPDATE` events and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits each `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. - For `UPDATE` events with the transaction `commitTS` greater than or equal to `thresholdTs`, TiCDC does not split them. For more information, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. From 3d73935728034f18ee741387d526776a1ba35e0a Mon Sep 17 00:00:00 2001 From: lidezhu <47731263+lidezhu@users.noreply.github.com> Date: Wed, 29 May 2024 11:10:10 +0800 Subject: [PATCH 06/14] Update ticdc/ticdc-behavior-change.md --- ticdc/ticdc-behavior-change.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 09b5693d964bc..38922874a5ff6 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -53,7 +53,7 @@ In this example, by executing three SQL statements to swap the primary keys of t Therefore, TiCDC splits these two events into four events, that is, deleting records `(1, 1)` and `(2, 2)` and writing records `(2, 1)` and `(1, 2)`. -#### MySQL sink +### MySQL sink Starting from v8.1.0, when using the MySQL sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: From bb0baacb0bdf634f743e45ce0f2f0ceb87e96341 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 11:52:58 +0800 Subject: [PATCH 07/14] Apply suggestions from code review --- ticdc/ticdc-behavior-change.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 38922874a5ff6..c0727b9b1aaa6 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -7,7 +7,7 @@ summary: Introduce the behavior changes of TiCDC changefeed, including the reaso ## Split `UPDATE` events into `DELETE` and `INSERT` events -### Transactions containing a single update change +### Transactions containing a single `UPDATE` change Starting from v6.5.3, v7.1.1, and v7.2.0, when using a non-MySQL sink, for transactions that only contain a single update change, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. For more information, see GitHub issue [#9086](https://github.com/pingcap/tiflow/issues/9086). @@ -29,7 +29,7 @@ In this example, the primary key `a` is updated from `1` to `2`. If the `UPDATE` * When using the CSV and AVRO protocols, the consumer only obtains the new value `a = 2` and cannot obtain the old value `a = 1`. This might cause the downstream consumer to only insert the new value `2` without deleting the old value `1`. * When using the index value dispatcher, the event for inserting `(1, 1)` might be sent to Partition 0, and the `UPDATE` event `(2, 1)` might be sent to Partition 1. If the consumption progress of Partition 1 is faster than that of Partition 0, an error might occur due to the absence of corresponding data in the downstream. Therefore, TiCDC splits the `UPDATE` event into `DELETE` and `INSERT` events. The event for deleting `(1, 1)` is sent to Partition 0, and the event for writing `(2, 1)` is sent to Partition 1, ensuring that the events are consumed successfully regardless of the progress of the consumer. -### Transactions containing multiple update changes +### Transactions containing multiple `UPDATE` changes Starting from v6.5.4, v7.1.2, and v7.4.0, for transactions containing multiple changes, if the primary key or non-null unique index value is modified in the `UPDATE` event, TiCDC splits the event into `DELETE` and `INSERT` events and ensures that all events follow the sequence of `DELETE` events preceding `INSERT` events. For more information, see GitHub issue [#9430](https://github.com/pingcap/tiflow/issues/9430). @@ -57,7 +57,7 @@ Therefore, TiCDC splits these two events into four events, that is, deleting rec Starting from v8.1.0, when using the MySQL sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: -- If the primary key or non-null unique index value is modified in `UPDATE` events and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits each `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. +- For transactions containing one or multiple `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits the `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. - For `UPDATE` events with the transaction `commitTS` greater than or equal to `thresholdTs`, TiCDC does not split them. For more information, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. From 7b6c3bf58fc49fbcc081c9cb27b6f7ceb40a47a7 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 13:52:18 +0800 Subject: [PATCH 08/14] Update release-8.1.0.md --- releases/release-8.1.0.md | 1 + 1 file changed, 1 insertion(+) diff --git a/releases/release-8.1.0.md b/releases/release-8.1.0.md index 634663d60de26..2ec2f4d1ce36f 100644 --- a/releases/release-8.1.0.md +++ b/releases/release-8.1.0.md @@ -172,6 +172,7 @@ Compared with the previous LTS 7.5.0, 8.1.0 includes new features, improvements, * In earlier versions, the `tidb.tls` configuration item in TiDB Lightning treats values `"false"` and `""` the same, as well as treating the values `"preferred"` and `"skip-verify"` the same. Starting from v8.1.0, TiDB Lightning distinguishes the behavior of `"false"`, `""`, `"skip-verify"`, and `"preferred"` for `tidb.tls`. For more information, see [TiDB Lightning configuration](/tidb-lightning/tidb-lightning-configuration.md). * For tables with `AUTO_ID_CACHE=1`, TiDB supports a [centralized auto-increment ID allocating service](/auto-increment.md#mysql-compatibility-mode). In earlier versions, the primary TiDB node of this service automatically performs a `forceRebase` operation when the TiDB process exits (for example, during the TiDB node restart) to keep auto-assigned IDs as consecutive as possible. However, when there are too many tables with `AUTO_ID_CACHE=1`, executing `forceRebase` becomes very time-consuming, preventing TiDB from restarting promptly and even blocking data writes, thus affecting system availability. To resolve this issue, starting from v8.1.0, TiDB removes the `forceRebase` behavior, but this change will cause some auto-assigned IDs to be non-consecutive during the failover. +* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using MySQL Sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events when the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). ### System variables From 45eaca3a1ec1359bf113347da2286df79dd3811f Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 13:56:39 +0800 Subject: [PATCH 09/14] Update releases/release-8.1.0.md --- releases/release-8.1.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/releases/release-8.1.0.md b/releases/release-8.1.0.md index 2ec2f4d1ce36f..0fce9c11ce1a4 100644 --- a/releases/release-8.1.0.md +++ b/releases/release-8.1.0.md @@ -172,7 +172,7 @@ Compared with the previous LTS 7.5.0, 8.1.0 includes new features, improvements, * In earlier versions, the `tidb.tls` configuration item in TiDB Lightning treats values `"false"` and `""` the same, as well as treating the values `"preferred"` and `"skip-verify"` the same. Starting from v8.1.0, TiDB Lightning distinguishes the behavior of `"false"`, `""`, `"skip-verify"`, and `"preferred"` for `tidb.tls`. For more information, see [TiDB Lightning configuration](/tidb-lightning/tidb-lightning-configuration.md). * For tables with `AUTO_ID_CACHE=1`, TiDB supports a [centralized auto-increment ID allocating service](/auto-increment.md#mysql-compatibility-mode). In earlier versions, the primary TiDB node of this service automatically performs a `forceRebase` operation when the TiDB process exits (for example, during the TiDB node restart) to keep auto-assigned IDs as consecutive as possible. However, when there are too many tables with `AUTO_ID_CACHE=1`, executing `forceRebase` becomes very time-consuming, preventing TiDB from restarting promptly and even blocking data writes, thus affecting system availability. To resolve this issue, starting from v8.1.0, TiDB removes the `forceRebase` behavior, but this change will cause some auto-assigned IDs to be non-consecutive during the failover. -* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using MySQL Sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events when the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). +* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using MySQL Sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events when the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at TiCDC startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). ### System variables From e8e3006aa003cf94cc58fa23c597105c52d270a5 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 13:57:38 +0800 Subject: [PATCH 10/14] wording updates --- releases/release-8.1.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/releases/release-8.1.0.md b/releases/release-8.1.0.md index 0fce9c11ce1a4..75d8b3d88af6e 100644 --- a/releases/release-8.1.0.md +++ b/releases/release-8.1.0.md @@ -172,7 +172,7 @@ Compared with the previous LTS 7.5.0, 8.1.0 includes new features, improvements, * In earlier versions, the `tidb.tls` configuration item in TiDB Lightning treats values `"false"` and `""` the same, as well as treating the values `"preferred"` and `"skip-verify"` the same. Starting from v8.1.0, TiDB Lightning distinguishes the behavior of `"false"`, `""`, `"skip-verify"`, and `"preferred"` for `tidb.tls`. For more information, see [TiDB Lightning configuration](/tidb-lightning/tidb-lightning-configuration.md). * For tables with `AUTO_ID_CACHE=1`, TiDB supports a [centralized auto-increment ID allocating service](/auto-increment.md#mysql-compatibility-mode). In earlier versions, the primary TiDB node of this service automatically performs a `forceRebase` operation when the TiDB process exits (for example, during the TiDB node restart) to keep auto-assigned IDs as consecutive as possible. However, when there are too many tables with `AUTO_ID_CACHE=1`, executing `forceRebase` becomes very time-consuming, preventing TiDB from restarting promptly and even blocking data writes, thus affecting system availability. To resolve this issue, starting from v8.1.0, TiDB removes the `forceRebase` behavior, but this change will cause some auto-assigned IDs to be non-consecutive during the failover. -* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using MySQL Sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events when the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at TiCDC startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). +* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using MySQL Sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events if the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at TiCDC startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). ### System variables From 5bbd94b5859b5fc3a184b8dd65bfefe707a1b8db Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 13:58:39 +0800 Subject: [PATCH 11/14] Update releases/release-8.1.0.md --- releases/release-8.1.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/releases/release-8.1.0.md b/releases/release-8.1.0.md index 75d8b3d88af6e..76a954360180e 100644 --- a/releases/release-8.1.0.md +++ b/releases/release-8.1.0.md @@ -172,7 +172,7 @@ Compared with the previous LTS 7.5.0, 8.1.0 includes new features, improvements, * In earlier versions, the `tidb.tls` configuration item in TiDB Lightning treats values `"false"` and `""` the same, as well as treating the values `"preferred"` and `"skip-verify"` the same. Starting from v8.1.0, TiDB Lightning distinguishes the behavior of `"false"`, `""`, `"skip-verify"`, and `"preferred"` for `tidb.tls`. For more information, see [TiDB Lightning configuration](/tidb-lightning/tidb-lightning-configuration.md). * For tables with `AUTO_ID_CACHE=1`, TiDB supports a [centralized auto-increment ID allocating service](/auto-increment.md#mysql-compatibility-mode). In earlier versions, the primary TiDB node of this service automatically performs a `forceRebase` operation when the TiDB process exits (for example, during the TiDB node restart) to keep auto-assigned IDs as consecutive as possible. However, when there are too many tables with `AUTO_ID_CACHE=1`, executing `forceRebase` becomes very time-consuming, preventing TiDB from restarting promptly and even blocking data writes, thus affecting system availability. To resolve this issue, starting from v8.1.0, TiDB removes the `forceRebase` behavior, but this change will cause some auto-assigned IDs to be non-consecutive during the failover. -* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using MySQL Sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events if the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at TiCDC startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). +* In earlier versions, when processing a transaction containing `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event, TiCDC splits this event into `DELETE` and `INSERT` events. Starting from v8.1.0, when using the MySQL sink, TiCDC only splits an `UPDATE` event into `DELETE` and `INSERT` events if the primary key or non-null unique index value is modified in the `UPDATE` event and the transaction `commitTS` is less than TiCDC `thresholdTs` (which is the current timestamp that TiCDC fetches from PD at TiCDC startup). This behavior change addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of `UPDATE` events received by TiCDC, which can lead to an incorrect order of split `DELETE` and `INSERT` events. For more information, see [documentation](/ticdc/ticdc-behavior-change.md#mysql-sink). ### System variables From cbf188eba7b43f53d0e268467c1c59ba34647f9f Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Wed, 29 May 2024 14:12:51 +0800 Subject: [PATCH 12/14] Update ticdc/ticdc-behavior-change.md --- ticdc/ticdc-behavior-change.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index f27cb5b5ef8cb..01205001fbd2b 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -114,4 +114,4 @@ As you can see from the preceding example, splitting the `UPDATE` event into `DE > **Notes:** > -> After this behavior change, when using MySQL sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file +> After this behavior change, when using the MySQL sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file From acfac776699c807799b12b92e67b186fc06023f7 Mon Sep 17 00:00:00 2001 From: lidezhu <47731263+lidezhu@users.noreply.github.com> Date: Wed, 29 May 2024 15:37:24 +0800 Subject: [PATCH 13/14] Update ticdc/ticdc-behavior-change.md Co-authored-by: Aolin --- ticdc/ticdc-behavior-change.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index 01205001fbd2b..dbf4442be7045 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -112,6 +112,6 @@ UPDATE t SET a = 3 WHERE a = 2; As you can see from the preceding example, splitting the `UPDATE` event into `DELETE` and `INSERT` events before writing them to the Sorter module ensures that all `DELETE` events are executed before `INSERT` events after the split, thereby maintaining data consistency regardless of the order of `UPDATE` events received by TiCDC. -> **Notes:** +> **Note:** > > After this behavior change, when using the MySQL sink, TiCDC does not split the `UPDATE` event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting `UPDATE` events into `DELETE` and `INSERT` events before writing them to the Sorter module. This ensures that all events within the same transaction are correctly ordered, with all `DELETE` events preceding `INSERT` events, thus correctly completing data replication. \ No newline at end of file From 208ad67ab84b892793844ff28932dfbf9c9e5c49 Mon Sep 17 00:00:00 2001 From: lidezhu <47731263+lidezhu@users.noreply.github.com> Date: Wed, 29 May 2024 15:37:45 +0800 Subject: [PATCH 14/14] Update ticdc/ticdc-behavior-change.md Co-authored-by: Aolin --- ticdc/ticdc-behavior-change.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/ticdc-behavior-change.md b/ticdc/ticdc-behavior-change.md index dbf4442be7045..317b971071531 100644 --- a/ticdc/ticdc-behavior-change.md +++ b/ticdc/ticdc-behavior-change.md @@ -55,7 +55,7 @@ Therefore, TiCDC splits these two events into four events, that is, deleting rec ### MySQL sink -Starting from v8.1.0, when using the MySQL sink, TiCDC fetches a current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of the timestamp: +Starting from v8.1.0, when using the MySQL sink, TiCDC fetches the current timestamp `thresholdTs` from PD at startup and decides whether to split `UPDATE` events based on the value of this timestamp: - For transactions containing one or multiple `UPDATE` changes, if the primary key or non-null unique index value is modified in an `UPDATE` event and the transaction `commitTS` is less than `thresholdTs`, TiCDC splits the `UPDATE` event into a `DELETE` event and an `INSERT` event before writing them to the Sorter module. - For `UPDATE` events with the transaction `commitTS` greater than or equal to `thresholdTs`, TiCDC does not split them. For more information, see GitHub issue [#10918](https://github.com/pingcap/tiflow/issues/10918).