Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][fn] Add API endpoint for function-worker for liveness check with configurable flag #358

Open
wants to merge 6 commits into
base: 3.1_ds
Choose a base branch
from

Conversation

mukesh-ctds
Copy link
Collaborator

Motivation

This pull request implements new health check functionality for Kubernetes deployment, focusing on the liveness probe for function worker. This liveliness API is required for Kubernetes-based applications, enabling automated pod restarts in case of failure (via liveness checks) whenever ProducerFencedException occurs.

Modifications

This update introduces an API endpoint to perform a liveliness check on the function worker pod. The API returns an HTTP status of 200 (OK) when the isLive flag within FunctionImpl is true. If the flag is false, typically after a ProducerFencedException occurs, the API will return a status of 503 (Service Unavailable).

The Kubernetes deployment configuration has been updated to use this new API endpoint in the liveness probe along with existing metrics endpoint, allowing the system to monitor the health and availability of the function worker.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs?

  • doc-required
  • no-need-doc
  • doc

@mukesh-ctds mukesh-ctds changed the title [improve][function] Add API endpoints for function-worker for liveness checks with configurable flags [improve][fn] Add API endpoints for function-worker for liveness checks with configurable flags Dec 31, 2024
@mukesh-ctds mukesh-ctds changed the title [improve][fn] Add API endpoints for function-worker for liveness checks with configurable flags [improve][fn] Add API endpoint for function-worker for liveness check with configurable flag Dec 31, 2024
@mukesh-ctds mukesh-ctds requested a review from pgier January 2, 2025 04:15
@@ -429,4 +430,18 @@ public void updateFunctionOnWorkerLeader(final @PathParam("tenant") String tenan
functions().updateFunctionOnWorkerLeader(tenant, namespace, functionName, uploadedInputStream,
delete, uri.getRequestUri(), authParams());
}

@GET
@Path("/live")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think /healthz is the common convention for this endpoint name.
Also, I think it would make more sense to put it directly under the root path instead of under /admin/v3/functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API has been updated to include a /healthz endpoint.
We cannot place this directly under the root path, as the root path does not have access to the necessary flag. The flag is available at the function level instead. Additionally, v2 does not contain any logic that could trigger a ProducerFencedException; this issue is specific to v3.

@@ -693,6 +695,8 @@ public void updateFunctionOnWorkerLeader(final String tenant,
try {
functionMetaDataManager.updateFunctionOnLeader(functionMetaData, delete);
} catch (IllegalStateException e) {
log.error("Function worker status has been set to false due to ProducerFencedException.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if we get an IllegalStateException which is not a ProducerFencedException? Maybe it would be better to catch the specific exception type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block of code has been moved to a location where we can check if the cause is a ProducerFencedException.

@mukesh-ctds mukesh-ctds requested a review from pgier January 3, 2025 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants