-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show specific error messages in the UI related to connection errors #3808
Comments
We're using NodeJS/OpenSSL under the hood. To my knowledge they are as unspecific as stated in the case of a broken SSL handshake. If there exists additional details in the response we get then we're happy to present them to the user. |
Hi @nikku 👋 Yeah, that'd be great. Ideally, it'd be really convenient to be able to see the more specific reasons as to why the Desktop Modeler is unable to connect to the Zeebe Gateway without having to look at the logs. So, any additional details that can be parsed out of a response, or out of an exception trace, and show in the UI could help a lot. Or, I had another idea ... maybe it could be possible to show the more detailed error messages inside the |
Parsing the log for special character streams does not seem to me like a satisfying (and robust) solution.
Maybe, if you have the chance, you could give it a debugging session yourself, and figure if there is pragmatic improvements we can do. I've tagged this as spring cleaning. |
Could we perhaps add additional checks prior to making a request to ensure that the TLS certificate is valid and display an error message if it's not? This form and lack of error response is a big frustration on my end, causing about 3-5 hours of debugging effort every time I have to deploy a model, which I often have to do for testing. (I have this issue both in web modeler and desktop modeler) Additionally, I would like to offer my support. If there's a change that can be made in the helm chart (I suppose this would be for the web modeler) such that a user does not need to configure their oauth url / zeebe gateway url, like environment variables I can add for these to be auto-filled, I would be more than happy to write a helm-chart patch for that. From my perspective, this form has shown the red box for:
There might be more reasons. In my debugging I've also messed around with the Audience form element and found that whatever you put in that field is completely ignored by the application. Also, I do know that there is a troubleshoot link that goes to the documentation, and while I am grateful for this, it is not good enough in my opinion. |
@jessesimpson36 Thanks for chiming in, and sorry to hear that you are frustrated abut SSL configuration issues in Camunda 8. Before we can try to improve the situation, let me better understand some of your feedback:
Could you elaborate what you deploy, and why this always takes such long amount of time to deploy + debug? Is what you do a common thing ordinary users do, and if so, how frequently do it? Which documentation / guidelines do you follow as you do it? |
Hi @nikku , I am a developer on the Distribution team who works on the helm charts. For me, it's pretty common to deploy the helm chart locally and do basic testing, especially as it relates to support tickets, new features, helping others internally, and the occasional customer calls where customers struggle to do similar things.
What I deploy is often a values.yaml for the https://github.com/camunda/camunda-platform-helm/tree/main/charts/camunda-platform repo, and many times, I take a customers values.yaml, and modify it so that I can test things locally with their configuration. The reason it takes a long time for me is because I don't know why it happens and that there are many reasons for the same error message (we basically just get a red box and something like "Unknown error. Please check Zeebe cluster status. Troubleshoot"). So what am I doing that it takes so long for me to debug?Once I get this error, I have to wonder whether the issue has to do with the networking, the deployment configuration, or the application code.
Usually, I go through those steps, they don't always help, and then I just make panic changes because theres no logs or error messages. I have gone to the troubleshoot link before, sometimes it helps. Most of the times it does not. It did help with my most recent frustration when I was trying to configure a read-only root filesystem though. That was when I learned about the magic file
Every user who installs C8 will need to verify that their installation is correct, and the only way to do that is to deploy a model and access each of the web components. Users will only have to debug this once, but I have to deploy the helm charts many times. So ordinary users will not be as frustrated as internal devs who are testing their installation. |
@jessesimpson36 Thanks for your feedback.
I see two things we want to address rather soonish (check if properly documented, and/or can be validated):
What I wonder is if we can provide a CLI utility that verifies the proper configuration of a C8 (self-managed) instance, using tools equipped to do the job? I.e. if I verify proper configuration of my mail server I turn to detailed diagnostics utilities (i.e. this) when just sending or receiving an email proves inconclusive. |
I'm not sure this is a good way to handle it. We have zbctl, but zbctl will often work regardless of whether the web modeler / desktop modeler can work. Or vice versa. To me, a better solution would simply be to have environment variables that would pre-fill that form, and for us to set them as part of the helm chart. so for example:
Then the user only puts in the client id and secret. The helm chart can then properly set those environment variables. That's more of a Web Modeler suggestion. I'm not sure if that's a good idea or not, or if that idea could be translated to the desktop modeler. I still also think better error messages makes most since directly inside the modeler / web modeler. |
What we'd accomplish with a test kit is to verify the remote end(s) are configured correctly, independent of |
What sort of test would make sense here? some openssl command to verify the ssl cert? SSL isn't the only problem that can trigger this error. Perhaps accessing an OIDC in endpoint to ensure the keycloak url is properly set (https://developer.okta.com/docs/reference/api/oidc/#well-known-openid-configuration). Perhaps something that tests if a port is open on the zeebe gateway and whether we can get a certain response out of it. |
If you ask me there is a couple of steps involved:
The second step is fully supported by |
@nikku can we also apply same changes within the scope of Web Modeler where possible? |
Ok, so we have the openssl command,
And the openid-configuration endpoint which can be queried like so:
I'd say this is a solid start. Are there ways to verify a GRPC connection? Like some sort of curl / healthcheck endpoint over GRPC to verify the gateway url connection works? |
With those two things passing, I still struggle to deploy a model. |
You want to run the Once Zeebe is connected you'd want to try to query the cluster topology using If |
To verify the full OID endpoint you want to assert (in a shell script) that the token-endpoint is reachable + that it provides you with a token, cf. this stackoverflow answer. Adapted to use client id and secret, of course. |
I'm realizing now that the desktop modeler can deploy a model easier for me than the web modeler can. same configuration for both, and the desktop modeler deploys successfully but not the web modeler. zbctl status succeeds for me. also the openssl command against the gateway. I need to look up what ALPN is, but the output from openssl has
so I think that part is fine. |
Just saw this message. yeah I'll give this a try and post here if I find something useful. |
ALPN stands for Application-Layer-Protocol-Negotiation. It is being used by GRPC (Zeebe) to negotiate the binary GRPC protocol on top of It must be appropriately configured for Zeebe only. Other endpoints use standard REST, where the protocol ( |
To test deployment to Zeebe I'd suggest to use the desktop modeler, with DEBUG logging enabled. This gives you detailed output (even if hidden via the UI) on things that go awry. |
Last point: Desktop modeler trusts your OS root certificates. In web modeler you may need to double check the behavior. |
Based on my investigation in Modeler land I followed up on three things:
|
ChatGPT-produced script for connection validation: #!/bin/bash
# Define the URLs of the remote endpoints
ZEEBE_ENDPOINT="https://example.com/zeebe"
OAUTH_ENDPOINT="https://example.com/oauth"
CLIENT_ID="your_client_id"
CLIENT_SECRET="your_client_secret"
SSL_ENABLED="true" # Set to "true" to enable SSL validation
# Function to check if an endpoint is reachable
check_endpoint_reachability() {
local endpoint="$1"
if curl -Is --connect-timeout 5 "$endpoint" >/dev/null; then
echo "Endpoint $endpoint is reachable."
else
echo "Endpoint $endpoint is not reachable."
exit 1
fi
}
# Function to validate SSL for an endpoint if SSL_ENABLED is set to "true"
validate_ssl() {
local endpoint="$1"
local ssl_enabled="$SSL_ENABLED"
if [[ "$ssl_enabled" == "true" ]]; then
if openssl s_client -connect "$(echo "$endpoint" | sed -e 's/https:\/\/\([^/]*\).*/\1/')" < /dev/null 2>/dev/null | openssl x509 -noout -checkend 0; then
echo "SSL for endpoint $endpoint is valid."
else
echo "SSL for endpoint $endpoint is not valid or expired."
exit 1
fi
else
echo "SSL validation is disabled."
fi
}
# Function to check if the OAuth callback URL is correct and obtain a token
check_oauth_callback() {
local oauth_callback_url="$1"
local oauth_token_url="$oauth_callback_url/token"
# Make a request to obtain an OAuth token using the client ID and client secret
local response
response=$(curl -s -X POST "$oauth_token_url" -d "client_id=$CLIENT_ID" -d "client_secret=$CLIENT_SECRET")
local http_status=$(echo "$response" | head -n 1 | awk '{print $2}')
if [[ "$http_status" == "200" ]]; then
if echo "$response" | jq -e '.access_token' >/dev/null; then
echo "OAuth callback URL is correct, received a 200 status code, and contains an access_token field in the JWT token."
else
echo "OAuth callback URL is correct and received a 200 status code, but the response does not contain an access_token field in the JWT token."
exit 1
fi
else
echo "OAuth callback URL is incorrect, or the request returned a non-200 status code."
exit 1
fi
}
# Check reachability of remote endpoints
check_endpoint_reachability "$ZEEBE_ENDPOINT" || exit 1
check_endpoint_reachability "$OAUTH_ENDPOINT" || exit 1
# Validate SSL for remote endpoints (if enabled)
validate_ssl "$ZEEBE_ENDPOINT"
validate_ssl "$OAUTH_ENDPOINT"
# Check if the OAuth callback URL is correct and obtain a token
check_oauth_callback "$OAUTH_ENDPOINT" || exit 1 |
Based on ChatGPT prior art camunda/zeebe-connection-test#10 adds a basic C8 connection checker, validating reachability, SSL and oauth. Can be extended in a fairly simple manner. |
Another review of our deploy flow (against C8 SaaS) uncovered that the C8 SaaS An update to C8 SaaS fixes the issue. In the future it will indicate wrong credentials as |
Thanks for all the feedback folks, especially @jessesimpson36! I think we were able to move the "deploy to C8 experience" forward substantially. We'll ship with the next release of the Desktop Modeler:
We've otherwise worked on / improved:
We collected the following potential follow-ups:
Closing this issue. Let's handle further improvements in individual follow-ups. |
Problem you would like to solve
Currently, the Desktop Modeler shows an generic error message of "Unknown error: Please Check Zeebe cluster status".
Tailing the logs shows the more specific connection error messages such as:
Proposed solution
Display the specific error messages in the ui to make troubleshooting easier.
Alternatives considered
It's possible to tail the Desktop Modeler logs. However this is tedious and not intuitive for most customers.
Additional context
No response
The text was updated successfully, but these errors were encountered: