-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: avoid too many gorountines when calc distance #33770
base: master
Are you sure you want to change the base?
Conversation
@CocaineCong Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
b083dfe
to
3951e85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! It will be nice of you to show some performance numbers under very large batch sizes between old way with the new approach and add it to UT
pkg/util/distance/calc_distance.go
Outdated
waitGroup.Done() | ||
} | ||
// avoid too many goroutines by ants pool | ||
poolSize := int(leftNum / 3) | ||
pool, err := ants.NewPoolWithFunc(poolSize, calcWorker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to milvus pkg/util/conc/pool.go, where wraps the ants.pool, you could use the pool by referring to other usages.
Also make this pool as static, not init it every time when we compute distances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, @chasingegg last week I was used sync.Once
to make the pool init once. just like that:
var (
calcPool *conc.Pool[any]
calcPoolInitOnce = new(sync.Once)
)
func initCalcPool() {
calcPool = conc.NewDefaultPool[any]()
}
func GetCalcPool() *conc.Pool[any] {
calcPoolInitOnce.Do(initCalcPool)
return calcPool
}
but I find it will going deep into a dead cycle when I run the unit test.
I haven't found the reason for the dead cycle, so I have to make the new pool first in every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask how can I reproduce your dead cycle problem? which unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask how can I reproduce your dead cycle problem? which unit test?
You can reproduce it by following these steps:
- copy the
sync.Once
code to make the pool init once.
- replace
pool := conc.NewDefaultPool[any]()
topool := GetCalcPool()
- run the ut where in
pkg/util/distance/calc_distance_test.go:155
, and find will go to the dead cycle.
- I'm guessing this would be a problem caused by me, so I'm still debuging this problem. But I've been so busy these past few days that I've procrastinated a lot. I will find out the reason why this weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will also look into it and hopefully could reproduce the problem
@CocaineCong Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
@CocaineCong E2e jenkins job failed, comment |
6254c14
to
efd1413
Compare
@CocaineCong E2e jenkins job failed, comment |
/run-cpu-e2e |
@chasingegg PTAL |
Commented |
@CocaineCong ut workflow job failed, comment |
hey, @chasingegg I found the reason why there is a dead loop.
|
@CocaineCong Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
727a007
to
fe9877f
Compare
@chasingegg PTAL. |
Signed-off-by: FanOne <[email protected]> Signed-off-by: FanOne <[email protected]>
Signed-off-by: FanOne <[email protected]> Signed-off-by: FanOne <[email protected]>
Signed-off-by: FanOne <[email protected]> Signed-off-by: FanOne <[email protected]>
Signed-off-by: FanOne <[email protected]> Signed-off-by: FanOne <[email protected]>
Signed-off-by: FanOne <[email protected]>
Signed-off-by: FanOne <[email protected]>
fe9877f
to
9c93138
Compare
@CocaineCong E2e jenkins job failed, comment |
Signed-off-by: FanOne <[email protected]>
Signed-off-by: FanOne <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: CocaineCong The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
the pr is resubmit for this #32776 pr
sorry for my mistake to close the pr #32776 and I can't reopen it again... so I have to resubmit this new pr 😭
I'm not sure if that's the right way to modify it ?
or use channel to control the number of goroutines instead of ants pool, like following: