Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Rocksmq panic with (send on closed channel) at runtime #29101

Closed
1 task done
chyezh opened this issue Dec 11, 2023 · 11 comments
Closed
1 task done

[Bug]: Rocksmq panic with (send on closed channel) at runtime #29101

chyezh opened this issue Dec 11, 2023 · 11 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chyezh
Copy link
Contributor

chyezh commented Dec 11, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.3-on-dev
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): Macos
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Rocksmq has a data race, which is likely to cause panic of (send on closed channel).

Expected Behavior

no panic at runtime

Steps To Reproduce

1. Frequently creating consumer and product messages at milvus cluster.

Milvus Log

panic: send on closed channel

goroutine 2599 [running]:
panic({0x105cd9740?, 0x1062026d0?})
/opt/homebrew/Cellar/go/1.21.4/libexec/src/runtime/panic.go:1017 +0x388 fp=0x14008e12960 sp=0x14008e128b0 pc=0x1025034b8
runtime.chansend(0x140032bb140, 0x14008e12a52, 0x0, 0x140061c08d0?)
/opt/homebrew/Cellar/go/1.21.4/libexec/src/runtime/chan.go:206 +0x3d4 fp=0x14008e129d0 sp=0x14008e12960 pc=0x1024ccdf4
runtime.selectnbsend(0x1400138b3a8?, 0x105b416a0?)
/opt/homebrew/Cellar/go/1.21.4/libexec/src/runtime/chan.go:694 +0x24 fp=0x14008e12a00 sp=0x14008e129d0 pc=0x1024cd9e4
github.com/milvus-io/milvus/internal/mq/mqimpl/rocksmq/server.(*rocksmq).Produce(0x1400138b340, {0x14002eb43f0, 0x16}, {0x1400729bf00, 0x1, 0x1})
/Users/zilliz/repo/github/chyezh/milvus/internal/mq/mqimpl/rocksmq/server/rocksmq_impl.go:666 +0x161c fp=0x14008e134a0 sp=0x14008e12a00 pc=0x10446682c
github.com/milvus-io/milvus/internal/mq/mqimpl/rocksmq/client.(*producer).Send(0x14003e250c8, 0x1400729b8c0)
/Users/zilliz/repo/github/chyezh/milvus/internal/mq/mqimpl/rocksmq/client/producer_impl.go:54 +0x138 fp=0x14008e135a0 sp=0x14008e134a0 pc=0x104474cc8
github.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/rmq.(*rmqProducer).Send(0x
14002a10bd0, {0x106236a80, 0x140040e3ef0}, 0x1400729b880)

Anything else?

No response

@chyezh chyezh added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 11, 2023
@chyezh
Copy link
Contributor Author

chyezh commented Dec 11, 2023

  1. RegisterConsumer without topic lock may see the modification at halfway of consumers.
  2. Store the closed consumers back to the consumers.
  3. all function read or write on consumers variable like RegisterConsumer, DestroyConsumerGroup, Produce has data race.
image image

@yanliang567
Copy link
Contributor

/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 13, 2023
chyezh added a commit to chyezh/milvus that referenced this issue Dec 18, 2023
Related Issue: milvus-io#29101

- Fix data race by add new lock and CopyOnWrite

- Add new unittest to verify it

Signed-off-by: chyezh <[email protected]>
Copy link

stale bot commented Jan 12, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Jan 12, 2024
@stale stale bot closed this as completed Jan 19, 2024
@chyezh chyezh reopened this Jun 21, 2024
@stale stale bot removed the stale indicates no udpates for 30 days label Jun 21, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Jun 21, 2024

related issue: #33285

Copy link

stale bot commented Jul 21, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Jul 21, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Jul 21, 2024

keep it

@stale stale bot removed the stale indicates no udpates for 30 days label Jul 21, 2024
Copy link

stale bot commented Aug 24, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Aug 24, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Aug 25, 2024

keep it

@stale stale bot removed the stale indicates no udpates for 30 days label Aug 25, 2024
Copy link

stale bot commented Sep 29, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Sep 29, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Sep 30, 2024

/reopen

@stale stale bot removed the stale indicates no udpates for 30 days label Sep 30, 2024
Copy link

stale bot commented Nov 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Nov 9, 2024
@stale stale bot closed this as completed Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants