Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Use libopenblas-openmp instead of libopenblas #35526

Closed
1 task done
alexanderguzhva opened this issue Aug 16, 2024 · 4 comments
Closed
1 task done

[Bug]: Use libopenblas-openmp instead of libopenblas #35526

alexanderguzhva opened this issue Aug 16, 2024 · 4 comments
Assignees
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. stale indicates no udpates for 30 days

Comments

@alexanderguzhva
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

By default, libopenblas uses pthreads for multithreading. We use openmp in our search engine. Combined, this may lead to N^2 threads being spawned (each of N openmp threads spawns N pthread threads). For example, this issue zilliztech/knowhere#739.

In order to mitigate this, we need to use an openmp-compatible version of openblas in Milvus Docker containers and install_deps scripts. For ubuntu, it is libopenblas0-openmp / libopenblas-openmp-dev instead of libopenblas / libopenblas-dev. Also, make sure that libopenblas0-pthread / libopenblas-pthread-dev are not installed.

I insist that issues like zilliztech/knowhere#739 were resolved incorrectly.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@alexanderguzhva alexanderguzhva added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 16, 2024
@xiaofan-luan
Copy link
Collaborator

seems to be a good suggestion.
I would suggestion to remove any dependency on openblas if possible in the future.

@yanliang567
Copy link
Contributor

/assign @liliu-z
/unassign

@sre-ci-robot sre-ci-robot assigned liliu-z and unassigned yanliang567 Aug 19, 2024
@liliu-z
Copy link
Member

liliu-z commented Aug 20, 2024

We previously agreed on the following:
The current stopgap measure is implemented in zilliztech/knowhere#741. This PR sets all OpenMP parallelism to 1, shifting concurrency management from BLAS to Knowhere.
The problem is that Knowhere only allocates a single thread from its pool for each search request on a single segment. With this temporary fix in place, if we perform sequential searches on a small number of segments, we end up with no/small parallelism in Knowhere, leading to bad performance. That said, this setup works fine when dealing with numerous segments or handling multiple concurrent search requests.

While modifying the library would be the ideal long-term solution, we still need to address some edge cases. For instance, how to protect the non-Docker users who might have an wrong OpenBLAS library in their environment and haven't run the install script beforehand.

Copy link

stale bot commented Sep 20, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Sep 20, 2024
@stale stale bot closed this as completed Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. stale indicates no udpates for 30 days
Projects
None yet
Development

No branches or pull requests

4 participants