Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test]: PostCommit Java SingleStoreIO IT failing #30564

Closed
16 tasks
Abacn opened this issue Mar 7, 2024 · 4 comments
Closed
16 tasks

[Failing Test]: PostCommit Java SingleStoreIO IT failing #30564

Abacn opened this issue Mar 7, 2024 · 4 comments

Comments

@Abacn
Copy link
Contributor

Abacn commented Mar 7, 2024

What happened?

Since Jan 31, 2024

Fails Install Singlestore cluster

Run kubectl apply -f /runner/_work/beam/beam/.test-infra/kubernetes/singlestore/sdb-cluster.yaml
memsqlcluster.memsql.com/sdb-cluster created
error: timed out waiting for the condition on memsqlclusters/sdb-cluster
Error: Process completed with exit code 1.

Performance test also failing

Issue Failure

Failure: Test is continually failing

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn
Copy link
Contributor Author

Abacn commented Mar 7, 2024

workload logs:

Status: Pods have warnings. node-sdb-cluster-master

2024-03-07 15:19:31.496 EST
✓ Created node with node ID A40217C2599E6693E3D37C2BCB195DA378E230AA
2024-03-07 15:19:31.908 EST
memsqlctl will perform the following actions:
2024-03-07 15:19:31.908 EST
  · Update configuration setting on node with node ID A40217C2599E6693E3D37C2BCB195DA378E230AA on port 3306
2024-03-07 15:19:31.908 EST
    - Update node config file with setting minimum_core_count=0
2024-03-07 15:19:31.908 EST
{}
2024-03-07 15:19:31.908 EST
Would you like to continue? [Y/n]: 
2024-03-07 15:19:31.908 EST
Automatically selected yes, non-interactive mode enabled

...

2024-03-07 15:19:38.746 EST
2024-03-07 20:19:38.746   INFO: Thread 115121 (ntid 225, conn id -1): memsqld_main: Flavor: 'production'
2024-03-07 15:19:38.756 EST
2024-03-07 20:19:38.756  ERROR: Thread 115104 (ntid 361, conn id -1): Run: Error getting cluster database
2024-03-07 15:19:38.756 EST
2024-03-07 20:19:38.756  ERROR: Thread 115104 (ntid 361, conn id -1): Run: Error getting cluster database
2024-03-07 15:19:38.756 EST
2024-03-07 20:19:38.756  ERROR: Thread 115104 (ntid 361, conn id -1): Run: Error getting cluster database
2024-03-07 15:19:38.756 EST
2024-03-07 20:19:38.756  ERROR: Thread 115104 (ntid 361, conn id -1): Run: Error getting cluster database
2024-03-07 15:19:38.757 EST
2024-03-07 20:19:38.757   INFO: Thread 115121 (ntid 225, conn id -1): CreateDatabase: CREATE DATABASE `memsql` with sync durability / sync input durability, 0 partitions, 0 sub partitions, 0 logical partitions, log file size 16777216.

...

2024-03-07 15:19:40.691 EST
Started singlestore (199)
2024-03-07 15:19:40.694 EST
Ensuring the root password is setup
2024-03-07 15:19:40.787 EST
Error 2277: This node is not part of the cluster.
2024-03-07 15:19:40.845 EST
2024-03-07 20:19:40.845   INFO: Thread 115120 (ntid 344, conn id -1): OnAsyncCompileCompleted: Query information_schema.'SELECT 1' submitted 177 milliseconds ago, queued for 17 milliseconds, compiled asynchronously in 160 milliseconds
2024-03-07 15:19:40.847 EST
2024-03-07 20:19:40.847  ERROR: [0 messages suppressed] ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user 'root'@'localhost' (using password: NO)

...

2024-03-07 15:19:41.181 EST
2024-03-07 20:19:41.181   INFO: Thread 115120 (ntid 344, conn id -1): OnAsyncCompileCompleted: Query (null).'SELECT @@MEMSQL_VERSION' submitted 133 milliseconds ago, queued for 17 milliseconds, compiled asynchronously in 116 milliseconds
2024-03-07 15:19:50.497 EST
Error 2277: This node is not part of the cluster.
2024-03-07 15:20:00.494 EST
Error 2277: This node is not part of the cluster.
2024-03-07 15:20:10.569 EST
Error 2277: This node is not part of the cluster.
2024-03-07 15:20:20.496 EST
Error 2277: This node is not part of the cluster.
2024-03-07 15:20:30.496 EST
Error 2277: This node is not part of the cluster.


Status: Pods have warnings. node-sdb-cluster-leaf-ag1

2024-03-07 15:19:49.683 EST
Initializing OpenSSL 1.0.2u-fips  20 Dec 2019
2024-03-07 15:19:49.688 EST
ERROR 2277 (HY000) at line 1: This node is not part of the cluster.
2024-03-07 15:19:49.704 EST
[2024-03-07 20:19:49 startup-probe] Aborting due to query failure
2024-03-07 15:19:49.808 EST
2024-03-07 20:19:49.808   INFO: Thread 115120 (ntid 388, conn id -1): OnAsyncCompileCompleted: Query (null).'select @@version_comment limit 1' submitted 142 milliseconds ago, queued for 17 milliseconds, compiled asynchronously in 125 milliseconds
2024-03-07 15:19:54.664 EST
ERROR 2277 (HY000) at line 1: This node is not part of the cluster.
2024-03-07 15:19:54.669 EST
[2024-03-07 20:19:54 startup-probe] Aborting due to query failure

...

2024-03-07 15:24:34.665 EST
ERROR 2277 (HY000) at line 1: This node is not part of the cluster.
2024-03-07 15:24:34.671 EST
[2024-03-07 20:24:34 startup-probe] Aborting due to query failure
2024-03-07 15:24:39.661 EST
ERROR 2277 (HY000) at line 1: This node is not part of the cluster.
2024-03-07 15:24:39.667 EST
[2024-03-07 20:24:39 startup-probe] Aborting due to query failure
2024-03-07 15:24:42.829 EST
2024-03-07 20:24:42.829  ERROR: Thread 115101 (ntid 408, conn id -1): Run: Error getting cluster database
2024-03-07 15:24:42.829 EST
2024-03-07 20:24:42.829  ERROR: Thread 115103 (ntid 406, conn id -1): Run: Error getting cluster database
2024-03-07 15:24:42.829 EST
2024-03-07 20:24:42.829  ERROR: Thread 115104 (ntid 405, conn id -1): Run: Error getting cluster database

@Abacn
Copy link
Contributor Author

Abacn commented Mar 7, 2024

The k8s configurations has been changed for months but the cluster failing to create suddenly since Jan 31. CC: @AdalbertMemSQL the author

@AdalbertMemSQL
Copy link
Contributor

Hey @Abacn
Is it possible to somehow retrieve full workload logs?

@Abacn
Copy link
Contributor Author

Abacn commented Mar 29, 2024

Fixed by #30725

@Abacn Abacn closed this as completed Mar 29, 2024
@github-actions github-actions bot added this to the 2.56.0 Release milestone Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants