Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The Milvus Java SDK occasionally returns a null object when calling MilvusClientV2Pool.getClient("MME"), even if there are many available connections in the pool, the NullPointerException will still happen. #37188

Open
1 task done
xiaojunxiang2023 opened this issue Oct 28, 2024 · 4 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@xiaojunxiang2023
Copy link

xiaojunxiang2023 commented Oct 28, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

Milvus Server: 2.4.11
Milvus sdk: java-sdk 2.4.3

Current Behavior

The Milvus Java SDK occasionally returns a null object when calling MilvusClientV2Pool.getClient("MME"), even if there are many available connections in the pool, the NullPointerException will still happen.

  • Client Code:

    1. PoolConfig( ):
    image

    2. getClient( ):

image

  • Log:

1. ActiveClient is 6, IdleClient is 0, less than the TotalPerKey(50)

image

2. When request 1 is stuck during client.createCollection, and request 2 comes in and executes MilvusClientV2Pool.getClient("MME"), it receives a null object.

image

  • This is just one example; sometimes, even when request 1 is not stuck, a NullPointerException still occurs. It happens roughly once every two weeks, and the root cause remains unknown.

Expected Behavior

In situations where the connection pool is not full, it should not return a null object.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@xiaojunxiang2023 xiaojunxiang2023 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2024
@xiaojunxiang2023 xiaojunxiang2023 changed the title [Bug]: The Milvus Java SDK occasionally returns a null object when calling MilvusClientV2Pool.getClient("MME"), even when there are many available connections in the pool, cause the NullPointerException. [Bug]: The Milvus Java SDK occasionally returns a null object when calling MilvusClientV2Pool.getClient("MME"); even when there are many available connections in the pool, cause the NullPointerException. Oct 28, 2024
@yanliang567
Copy link
Contributor

/assign @yhmo
/unassign

@sre-ci-robot sre-ci-robot assigned yhmo and unassigned yanliang567 Oct 28, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2024
@xiaofan-luan
Copy link
Collaborator

I don't think it make sense to return null to user.

when get connect and no available connections, we should just block and wait for a valid connection.

Anyone interested in this?

@xiaojunxiang2023 xiaojunxiang2023 changed the title [Bug]: The Milvus Java SDK occasionally returns a null object when calling MilvusClientV2Pool.getClient("MME"); even when there are many available connections in the pool, cause the NullPointerException. [Bug]: The Milvus Java SDK occasionally returns a null object when calling MilvusClientV2Pool.getClient("MME"), even if there are many available connections in the pool, the NullPointerException will still happen. Oct 31, 2024
@yhmo
Copy link
Contributor

yhmo commented Dec 2, 2024

@xiaojunxiang2023
"sometimes, even when request 1 is not stuck, a NullPointerException still occurs. It happens roughly once every two weeks, and the root cause remains unknown."

The similar root cause of this issue: #37613 (comment)

The root cause:
v2.4.4 implemented an enhancement "Check connection when MilvusClientV2 is initialized", MilvusClientV2 constructor calls rpc interface connect() to pass client-side info to the server. The timeout value is hard-coded to 1 second. In some situations, it might fail to initial MilvusClientV2 with network problems.

It has been fixed in v2.4.6:
In v2.4.6, this issue is fixed by using the ConnectConfig.getConnectTimeoutMs() as timeout value. The default value of getConnectTimeoutMs() is 10 seconds.

Behavior change of MilvusClientPool.getClient():
A user mentioned that the getClient() should throw an exception instead of returning null, so we made a change in v2.4.9, you will get an exception if the getClient() failed to create a MilvusClient.
milvus-io/milvus-sdk-java#1118

So, you can upgrade the sdk to the latest version. If you get an exception from getClient(), you can call it again.

BTW: For Java sdk related issues, you can put them into the java sdk repo: https://github.com/milvus-io/milvus-sdk-java/issues

@xiaojunxiang2023
Copy link
Author

@xiaojunxiang2023 "sometimes, even when request 1 is not stuck, a NullPointerException still occurs. It happens roughly once every two weeks, and the root cause remains unknown."

The similar root cause of this issue: #37613 (comment)

The root cause: v2.4.4 implemented an enhancement "Check connection when MilvusClientV2 is initialized", MilvusClientV2 constructor calls rpc interface connect() to pass client-side info to the server. The timeout value is hard-coded to 1 second. In some situations, it might fail to initial MilvusClientV2 with network problems.

It has been fixed in v2.4.6: In v2.4.6, this issue is fixed by using the ConnectConfig.getConnectTimeoutMs() as timeout value. The default value of getConnectTimeoutMs() is 10 seconds.

Behavior change of MilvusClientPool.getClient(): A user mentioned that the getClient() should throw an exception instead of returning null, so we made a change in v2.4.9, you will get an exception if the getClient() failed to create a MilvusClient. milvus-io/milvus-sdk-java#1118

So, you can upgrade the sdk to the latest version. If you get an exception from getClient(), you can call it again.

BTW: For Java sdk related issues, you can put them into the java sdk repo: https://github.com/milvus-io/milvus-sdk-java/issues

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants