Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] RayService HA test - GCS fault tolerance + kill GCS process #2577

Open
1 of 2 tasks
Tracked by #2177
kevin85421 opened this issue Nov 26, 2024 · 2 comments
Open
1 of 2 tasks
Tracked by #2177
Assignees
Labels

Comments

@kevin85421
Copy link
Member

kevin85421 commented Nov 26, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

  • Create a RayService with GCS FT enabled. No Ray Serve replica should be deployed on the head Pod.
  • Kill the GCS process on the head Pod pkill gcs_server.
  • Wait until the head Pod is removed from the K8s serve service.
  • Use locust to submit requests until the new Ray head is running and ready for 30 seconds.
  • No request should be dropped.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421
Copy link
Member Author

/assign @CheyuWu

@CheyuWu
Copy link
Contributor

CheyuWu commented Nov 27, 2024

@kevin85421 I'd like to help with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants