Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch-benchmark mysql cdc streaming q1 q2 result mismatch #15721

Closed
cyliu0 opened this issue Mar 17, 2024 · 7 comments
Closed

ch-benchmark mysql cdc streaming q1 q2 result mismatch #15721

cyliu0 opened this issue Mar 17, 2024 · 7 comments
Assignees
Labels
Milestone

Comments

@cyliu0
Copy link
Collaborator

cyliu0 commented Mar 17, 2024

Describe the bug

The base tables are consistent. But the query results of q1 and q2 are different from the upstream.

https://buildkite.com/risingwave-test/ch-benchmark-mysql-cdc-shared-source/builds/5#018e44f9-49a3-4f95-a881-ca6a0c7f91a8

image

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

nightly-20240314

Additional context

No response

@cyliu0 cyliu0 added the type/bug Something isn't working label Mar 17, 2024
@github-actions github-actions bot added this to the release-1.8 milestone Mar 17, 2024
@StrikeW
Copy link
Contributor

StrikeW commented Mar 18, 2024

The source tables are consistent, could you find someone to look into the problem? @fuyufjh

@fuyufjh
Copy link
Member

fuyufjh commented Mar 18, 2024

The k8s namespace tpc-20240316-015732 has been deleted. Can't check the data manually. @cyliu0 Can you help to run the pipeline again with keeping the env on failure?

Previously related issues #14998 #15057 have been fixed by #15232. This seems to be a new one. 🥹

@hzxa21
Copy link
Collaborator

hzxa21 commented Mar 18, 2024

Buildkite

I think it is a query issue related to ch_benchmark_q2. I ran the SQL defined in ch_benchmark_q2 via batch query and the result count count matches with the MV output.

dev=> select count(*) from ch_benchmark_q2;
 count 
-------
  1882
(1 row)

dev=> select count(*) from (SELECT s_suppkey, s_name, n_name, i_id, i_name, s_address, s_phone, s_comment FROM item, supplier, stock, nation, region, (SELECT s_i_id AS m_i_id, min(s_quantity) AS m_s_quantity FROM stock, supplier, nation, region WHERE mod((s_w_id * s_i_id), 10000) = s_suppkey AND s_nationkey = n_nationkey AND n
_regionkey = r_regionkey AND r_name LIKE 'EUROP%' GROUP BY s_i_id) AS m WHERE i_id = s_i_id AND mod((s_w_id * s_i_id), 10000) = s_suppkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND i_data LIKE '%b' AND r_name LIKE 'EUROP%' AND i_id = m_i_id AND s_quantity = m_s_quantity);
 count 
-------
  1882
(1 row)

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Mar 18, 2024

@fuyufjh The reproduced build with kept cluster. The namespace is tpc-20240318-031107. https://buildkite.com/risingwave-test/ch-benchmark-mysql-cdc-shared-source/builds/7#018e4f89-287b-45e6-8bd4-7b47e006a7a7/8381-8591

@fuyufjh
Copy link
Member

fuyufjh commented Mar 18, 2024

I think it is a query issue related to ch_benchmark_q2. I ran the SQL defined in ch_benchmark_q2 via batch query and the result count count matches with the MV output.

Same for ch_benchmark_q1:

  • Left: Query result from RisingWave
  • Right: Query result from MySQL
Screenshot 2024-03-18 at 21 10 08

The problem might be the number of decimals in avg_amount.

Note that in the query

round(avg(ol_amount), 16) AS avg_amount

So, our results are correct.

@lmatz
Copy link
Contributor

lmatz commented Mar 18, 2024

looks like we encountered the decimal issue: https://github.com/risingwavelabs/qa-infra/pull/73 before cc: @cyliu0
maybe we didn't do the same to MySQL as we did to Postgresql: https://github.com/risingwavelabs/qa-infra/blob/main/pkg/storage/postgres.go#L62

edit:
it seems we also did to MySQL: https://github.com/risingwavelabs/qa-infra/blob/main/pkg/storage/mysql.go#L60

@fuyufjh
Copy link
Member

fuyufjh commented Mar 18, 2024

For ch_benchmark_q2, the problem is that MySQL is case-insensitive by default while RisingWave/Postgres is not

MySQL's results set is a superset of RW's, so I managed to find out a row that is only included in MySQL's result set:

mysql> SELECT * FROM item, supplier, stock, nation, region, (SELECT s_i_id AS m_i_id, min(s_quantity) AS m_s_quantity FROM stock, supplier, nation, region WHERE mod((s_w_id * s_i_id), 10000) = s_suppkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name LIKE 'EUROP%' GROUP BY s_i_id) AS m WHERE i_id = s_i_id AND mod((s_w_id * s_i_id), 10000) = s_suppkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND i_data LIKE '%b' AND r_name LIKE 'EUROP%' AND i_id = m_i_id AND s_quantity = m_s_quantity and s_suppkey = 9107 and i_id = 10193 and s_i_id = 10193;
+-------+---------+---------------------+---------+-------------------------------------------------+-----------+--------------------+-----------------------------------------+-------------+-----------------+-----------+-------------------------------------------+--------+--------+------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+-------+-------------+--------------+--------------------------------------------------+-------------+---------+-------------+-----------------------------------------------------------------------------------------------------------------+-------------+--------+-----------------------------------------------+--------+--------------+
| i_id  | i_im_id | i_name              | i_price | i_data                                          | S_SUPPKEY | S_NAME             | S_ADDRESS                               | S_NATIONKEY | S_PHONE         | S_ACCTBAL | S_COMMENT                                 | s_i_id | s_w_id | s_quantity | s_dist_01                | s_dist_02                | s_dist_03                | s_dist_04                | s_dist_05                | s_dist_06                | s_dist_07                | s_dist_08                | s_dist_09                | s_dist_10                | s_ytd | s_order_cnt | s_remote_cnt | s_data                                           | N_NATIONKEY | N_NAME  | N_REGIONKEY | N_COMMENT                                                                                                       | R_REGIONKEY | R_NAME | R_COMMENT                                     | m_i_id | m_s_quantity |
+-------+---------+---------------------+---------+-------------------------------------------------+-----------+--------------------+-----------------------------------------+-------------+-----------------+-----------+-------------------------------------------+--------+--------+------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+-------+-------------+--------------+--------------------------------------------------+-------------+---------+-------------+-----------------------------------------------------------------------------------------------------------------+-------------+--------+-----------------------------------------------+--------+--------------+
| 10193 |    5858 | 0wJbGPIO8BnT5M4paJX |   86.11 | EC8EDccPUFHGXVf3tO0ovVdYKyHa3OYWln4r2qIdO7QUYLB |      9107 | Supplier#000009107 | DT9ZkOPMlednufUM5tJeuuhp9hDmXbhq qmLdFf |          19 | 29-928-638-9549 |   6304.41 | eans. ruthless accounts haggle blithely:  |  10193 |     99 |         11 | CSRIBQXJJXRRHMLPBONFEDOT | HHPJFJFEQSIMYQLBPPMJDLEY | GYPWHRWFIHWKBZOZUAFXRHTC | YNTRLJSHZGPIDVEDIAKOSMAR | XHADTYKLEAGLLONYYWHNJYVP | BPJJQXQFTLDROVPREHGULOFJ | YITLQOUHAOIXBOXZIVTSXVLJ | THFGWOKHTCJZKVHCALAJOWIS | APDJSVAIYBRVAONXJNSVHEVA | ERHOHGUZOBMTKXGAEAWRBRGA |     0 |           0 |            0 | TrUmA1K1XSesrJPUATltI6MUx2UrACdEuk9nCxlgzAQcewfp |          19 | ROMANIA |           3 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account |           3 | EUROPE | ly final courts cajole furiously final excuse |  10193 |           11 |
+-------+---------+---------------------+---------+-------------------------------------------------+-----------+--------------------+-----------------------------------------+-------------+-----------------+-----------+-------------------------------------------+--------+--------+------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+-------+-------------+--------------+--------------------------------------------------+-------------+---------+-------------+-----------------------------------------------------------------------------------------------------------------+-------------+--------+-----------------------------------------------+--------+--------------+
1 row in set (1 min 17.46 sec)

Note that i_data is EC8EDccPUFHGXVf3tO0ovVdYKyHa3OYWln4r2qIdO7QUYLB, which can pass i_data LIKE '%b', while in RW/PG, it will be filtered out.

To overcome the problem, please set MySQL's column or global collation to be case-senstive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants