Show specific error messages in the UI related to connection errors #3808

upgradingdave · 2023-08-28T18:01:23Z

Problem you would like to solve

Currently, the Desktop Modeler shows an generic error message of "Unknown error: Please Check Zeebe cluster status".

Tailing the logs shows the more specific connection error messages such as:

RequestError: certificate has expired

Proposed solution

Display the specific error messages in the ui to make troubleshooting easier.

Alternatives considered

It's possible to tail the Desktop Modeler logs. However this is tedious and not intuitive for most customers.

Additional context

No response

The text was updated successfully, but these errors were encountered:

nikku · 2023-08-28T18:15:15Z

We're using NodeJS/OpenSSL under the hood. To my knowledge they are as unspecific as stated in the case of a broken SSL handshake.

If there exists additional details in the response we get then we're happy to present them to the user.

upgradingdave · 2023-08-28T21:25:37Z

Hi @nikku 👋

Yeah, that'd be great. Ideally, it'd be really convenient to be able to see the more specific reasons as to why the Desktop Modeler is unable to connect to the Zeebe Gateway without having to look at the logs.

So, any additional details that can be parsed out of a response, or out of an exception trace, and show in the UI could help a lot.

Or, I had another idea ... maybe it could be possible to show the more detailed error messages inside the Log panel? Perhaps the human readable portion of the stack trace (such as RequestError: certificate has expired) could be shown in the Log panel?

nikku · 2023-08-29T06:26:57Z

Parsing the log for special character streams does not seem to me like a satisfying (and robust) solution.

zeebe-node error handling is what we'd need to plug into.

Maybe, if you have the chance, you could give it a debugging session yourself, and figure if there is pragmatic improvements we can do.

I've tagged this as spring cleaning.

jessesimpson36 · 2023-08-30T15:49:09Z

Could we perhaps add additional checks prior to making a request to ensure that the TLS certificate is valid and display an error message if it's not?

This form and lack of error response is a big frustration on my end, causing about 3-5 hours of debugging effort every time I have to deploy a model, which I often have to do for testing. (I have this issue both in web modeler and desktop modeler)

Additionally, I would like to offer my support. If there's a change that can be made in the helm chart (I suppose this would be for the web modeler) such that a user does not need to configure their oauth url / zeebe gateway url, like environment variables I can add for these to be auto-filled, I would be more than happy to write a helm-chart patch for that.

From my perspective, this form has shown the red box for:

Untrusted TLS certificate in the gateway
Untrusted TLS certificate in keycloak
Failing to write to a cache file that I never even knew existed
Invalid client credentials

There might be more reasons. In my debugging I've also messed around with the Audience form element and found that whatever you put in that field is completely ignored by the application.

Also, I do know that there is a troubleshoot link that goes to the documentation, and while I am grateful for this, it is not good enough in my opinion.

nikku · 2023-08-31T08:00:29Z

@jessesimpson36 Thanks for chiming in, and sorry to hear that you are frustrated abut SSL configuration issues in Camunda 8.

Before we can try to improve the situation, let me better understand some of your feedback:

This form and lack of error response is a big frustration on my end, causing about 3-5 hours of debugging effort every time I have to deploy a model, which I often have to do for testing. (I have this issue both in web modeler and desktop modeler)

Could you elaborate what you deploy, and why this always takes such long amount of time to deploy + debug? Is what you do a common thing ordinary users do, and if so, how frequently do it? Which documentation / guidelines do you follow as you do it?

jessesimpson36 · 2023-09-11T19:51:24Z

Hi @nikku ,

I am a developer on the Distribution team who works on the helm charts. For me, it's pretty common to deploy the helm chart locally and do basic testing, especially as it relates to support tickets, new features, helping others internally, and the occasional customer calls where customers struggle to do similar things.

Could you elaborate what you deploy, and why this always takes such long amount of time to deploy + debug?

What I deploy is often a values.yaml for the https://github.com/camunda/camunda-platform-helm/tree/main/charts/camunda-platform repo, and many times, I take a customers values.yaml, and modify it so that I can test things locally with their configuration. The reason it takes a long time for me is because I don't know why it happens and that there are many reasons for the same error message (we basically just get a red box and something like "Unknown error. Please check Zeebe cluster status. Troubleshoot").

So what am I doing that it takes so long for me to debug?

Once I get this error, I have to wonder whether the issue has to do with the networking, the deployment configuration, or the application code.

I check to see if the cluster endpoint matches the external url in the ingress configuration
I check that the OAuth2 url host name matches the keycloak host name designated in the ingress configuration
I modify the ending of the oauth2 url: /auth/realms/camunda-platform/protocol/openid-connect/token. I often try removing the auth part, or playing around with different urls because I have no idea how I'm supposed to get this magic url.
I verify the client ID and client secret in identity. Sometimes I will make a new Application in identity with all privileges, and sometimes I will just use the Zeebe client.
I test all my TLS certs to ensure they are all valid
I check the logs for Zeebe, Zeebe-gateway, and the web modeler restapi. The logs have always been worthless for me in debugging this, but I check them anyway.
I modify the Keycloak url to use the kubernetes service name as the hostname instead of the external-facing url
I modify the OAuth url to use the kubernetes service name as the hostname instead of the external-facing url
I try the desktop modeler with previous steps to see if that's any different
I refer to daves message here: https://camunda.slack.com/archives/C05764N4VNZ/p1690906641310499

Usually, I go through those steps, they don't always help, and then I just make panic changes because theres no logs or error messages. I have gone to the troubleshoot link before, sometimes it helps. Most of the times it does not. It did help with my most recent frustration when I was trying to configure a read-only root filesystem though. That was when I learned about the magic file ZEEBE_CLIENT_CONFIG_PATH=/path/to/credentials/cache.txt using the docs link.

Is what you do a common thing ordinary users do, and if so, how frequently do it?

Every user who installs C8 will need to verify that their installation is correct, and the only way to do that is to deploy a model and access each of the web components. Users will only have to debug this once, but I have to deploy the helm charts many times. So ordinary users will not be as frustrated as internal devs who are testing their installation.

nikku · 2023-09-12T13:17:57Z

@jessesimpson36 Thanks for your feedback.

I modify the ending of the oauth2 url: /auth/realms/camunda-platform/protocol/openid-connect/token. I often try removing the auth part, or playing around with different urls because I have no idea how I'm supposed to get this magic url.

I see two things we want to address rather soonish (check if properly documented, and/or can be validated):

Partially in scope of the modeler: What OAuth URL should be entered, how do I learn about it, how can we validate it?
In scope of the (Desktop) modeler: Can we return more meaningful error messages?

Every user who installs C8 will need to verify that their installation is correct, and the only way to do that is to deploy a model and access each of the web components.

What I wonder is if we can provide a CLI utility that verifies the proper configuration of a C8 (self-managed) instance, using tools equipped to do the job? I.e. if I verify proper configuration of my mail server I turn to detailed diagnostics utilities (i.e. this) when just sending or receiving an email proves inconclusive.

jessesimpson36 · 2023-09-14T13:04:16Z

What I wonder is if we can provide a CLI utility that verifies the proper configuration of a C8 (self-managed) instance, using tools equipped to do the job?

I'm not sure this is a good way to handle it. We have zbctl, but zbctl will often work regardless of whether the web modeler / desktop modeler can work. Or vice versa.

To me, a better solution would simply be to have environment variables that would pre-fill that form, and for us to set them as part of the helm chart. so for example:

CLUSTER_ENDPOINT=http://<RELEASE>-zeebe-gateway:26500
IDENTITY_OAUTH_URL=http://<RELEASE>-keycloak/auth/realms/camunda-platform/protocol/openid-connect/token
DEFAULT_AUDIENCE=test

Then the user only puts in the client id and secret. The helm chart can then properly set those environment variables.

That's more of a Web Modeler suggestion. I'm not sure if that's a good idea or not, or if that idea could be translated to the desktop modeler.

I still also think better error messages makes most since directly inside the modeler / web modeler.

nikku · 2023-09-14T13:10:07Z

What we'd accomplish with a test kit is to verify the remote end(s) are configured correctly, independent of zbctl (being extremely forgiving with SSL certificates) and the modelers (being fairly strict).

jessesimpson36 · 2023-09-18T15:24:17Z

What sort of test would make sense here? some openssl command to verify the ssl cert? SSL isn't the only problem that can trigger this error. Perhaps accessing an OIDC in endpoint to ensure the keycloak url is properly set (https://developer.okta.com/docs/reference/api/oidc/#well-known-openid-configuration). Perhaps something that tests if a port is open on the zeebe gateway and whether we can get a certain response out of it.

nikku · 2023-09-19T08:53:58Z

If you ask me there is a couple of steps involved:

Ensure remote endpoints are reachable
SSL: Ensure remote endpoints are trusted / properly configured
Ensure remote endpoints are correct (right OAuth callback url)

The second step is fully supported by openssl connection diagnosis.

CatalinaMoisuc · 2023-09-19T09:44:46Z

@nikku can we also apply same changes within the scope of Web Modeler where possible?

jessesimpson36 · 2023-09-19T14:27:32Z

Ok, so we have the openssl command,

timeout 1 openssl s_client -alpn h2 -connect modeler.dev.jlscode.com:443 -servername modeler.dev.jlscode.com -brief 
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384
Peer certificate: CN = modeler.dev.jlscode.com
Hash used: SHA256
Signature type: RSA-PSS
Verification: OK
Server Temp Key: X25519, 253 bits

And the openid-configuration endpoint which can be queried like so:

> curl --silent \
         -X GET \
        https://keycloak.dev.jlscode.com/auth/realms/camunda-platform/.well-known/openid-configuration  \
        | jq .authorization_endpoint

"https://keycloak.dev.jlscode.com/auth/realms/camunda-platform/protocol/openid-connect/auth"

I'd say this is a solid start. Are there ways to verify a GRPC connection? Like some sort of curl / healthcheck endpoint over GRPC to verify the gateway url connection works?

jessesimpson36 · 2023-09-19T14:47:12Z

With those two things passing, I still struggle to deploy a model.

nikku · 2023-09-19T14:49:08Z

You want to run the openssl command against all endpoints, keycloak, modeler, and zeebe gateway. alpn is, to my knowledge, only required for Zeebe.

Once Zeebe is connected you'd want to try to query the cluster topology using zbctl status.

If zbctl status succeeds then you may give the Desktop Modeler a try.

nikku · 2023-09-19T15:06:38Z

To verify the full OID endpoint you want to assert (in a shell script) that the token-endpoint is reachable + that it provides you with a token, cf. this stackoverflow answer. Adapted to use client id and secret, of course.

jessesimpson36 · 2023-09-19T15:12:58Z

I'm realizing now that the desktop modeler can deploy a model easier for me than the web modeler can. same configuration for both, and the desktop modeler deploys successfully but not the web modeler.

zbctl status succeeds for me. also the openssl command against the gateway.

I need to look up what ALPN is, but the output from openssl has

ALPN protocol: h2

so I think that part is fine.

jessesimpson36 · 2023-09-19T16:10:59Z

Parsing the log for special character streams does not seem to me like a satisfying (and robust) solution.

zeebe-node error handling is what we'd need to plug into.

Maybe, if you have the chance, you could give it a debugging session yourself, and figure if there is pragmatic improvements we can do.

I've tagged this as spring cleaning.

Just saw this message. yeah I'll give this a try and post here if I find something useful.

nikku · 2023-09-20T06:05:13Z

ALPN stands for Application-Layer-Protocol-Negotiation. It is being used by GRPC (Zeebe) to negotiate the binary GRPC protocol on top of HTTP(S)/2 + TLS connection to the server.

It must be appropriately configured for Zeebe only. Other endpoints use standard REST, where the protocol (HTTP(S)) is settled on the protocol layer, before initiating communication, after TLS.

nikku · 2023-09-20T06:07:35Z

To test deployment to Zeebe I'd suggest to use the desktop modeler, with DEBUG logging enabled.

This gives you detailed output (even if hidden via the UI) on things that go awry.

nikku · 2023-09-20T06:09:19Z

Last point: Desktop modeler trusts your OS root certificates. In web modeler you may need to double check the behavior.

nikku · 2023-09-20T12:31:47Z

Based on my investigation in Modeler land I followed up on three things:

Drop miss-leading audience hint in deploy overlay (C8 self-managed) #3864
showcase self-managed OAauth URL + audience in use (https://github.com/camunda/camunda-docs-modeler-screenshots/pull/62)
A basic connection checker (feat: add test-connection.sh script zeebe-connection-test#10)

barmac · 2023-09-20T13:58:09Z

ChatGPT-produced script for connection validation:

#!/bin/bash

# Define the URLs of the remote endpoints
ZEEBE_ENDPOINT="https://example.com/zeebe"
OAUTH_ENDPOINT="https://example.com/oauth"
CLIENT_ID="your_client_id"
CLIENT_SECRET="your_client_secret"
SSL_ENABLED="true" # Set to "true" to enable SSL validation

# Function to check if an endpoint is reachable
check_endpoint_reachability() {
    local endpoint="$1"
    if curl -Is --connect-timeout 5 "$endpoint" >/dev/null; then
        echo "Endpoint $endpoint is reachable."
    else
        echo "Endpoint $endpoint is not reachable."
        exit 1
    fi
}

# Function to validate SSL for an endpoint if SSL_ENABLED is set to "true"
validate_ssl() {
    local endpoint="$1"
    local ssl_enabled="$SSL_ENABLED"
    
    if [[ "$ssl_enabled" == "true" ]]; then
        if openssl s_client -connect "$(echo "$endpoint" | sed -e 's/https:\/\/\([^/]*\).*/\1/')" < /dev/null 2>/dev/null | openssl x509 -noout -checkend 0; then
            echo "SSL for endpoint $endpoint is valid."
        else
            echo "SSL for endpoint $endpoint is not valid or expired."
            exit 1
        fi
    else
        echo "SSL validation is disabled."
    fi
}

# Function to check if the OAuth callback URL is correct and obtain a token
check_oauth_callback() {
    local oauth_callback_url="$1"
    local oauth_token_url="$oauth_callback_url/token"
    
    # Make a request to obtain an OAuth token using the client ID and client secret
    local response
    response=$(curl -s -X POST "$oauth_token_url" -d "client_id=$CLIENT_ID" -d "client_secret=$CLIENT_SECRET")
    local http_status=$(echo "$response" | head -n 1 | awk '{print $2}')
    
    if [[ "$http_status" == "200" ]]; then
        if echo "$response" | jq -e '.access_token' >/dev/null; then
            echo "OAuth callback URL is correct, received a 200 status code, and contains an access_token field in the JWT token."
        else
            echo "OAuth callback URL is correct and received a 200 status code, but the response does not contain an access_token field in the JWT token."
            exit 1
        fi
    else
        echo "OAuth callback URL is incorrect, or the request returned a non-200 status code."
        exit 1
    fi
}

# Check reachability of remote endpoints
check_endpoint_reachability "$ZEEBE_ENDPOINT" || exit 1
check_endpoint_reachability "$OAUTH_ENDPOINT" || exit 1

# Validate SSL for remote endpoints (if enabled)
validate_ssl "$ZEEBE_ENDPOINT"
validate_ssl "$OAUTH_ENDPOINT"

# Check if the OAuth callback URL is correct and obtain a token
check_oauth_callback "$OAUTH_ENDPOINT" || exit 1

nikku · 2023-09-20T15:58:43Z

Based on ChatGPT prior art camunda/zeebe-connection-test#10 adds a basic C8 connection checker, validating reachability, SSL and oauth. Can be extended in a fairly simple manner.

nikku · 2023-09-21T12:43:19Z

Another review of our deploy flow (against C8 SaaS) uncovered that the C8 SaaS /token endpoint indicates "client not found for CLIENT_ID" with HTTP 404, making it indistinguishable for us from "endpoint not found" (ref).

An update to C8 SaaS fixes the issue. In the future it will indicate wrong credentials as HTTP 401 as suggested by the OpenID specification (Page 45).

nikku · 2023-09-22T14:04:26Z

Thanks for all the feedback folks, especially @jessesimpson36! I think we were able to move the "deploy to C8 experience" forward substantially.

We'll ship with the next release of the Desktop Modeler:

We've otherwise worked on / improved:

We collected the following potential follow-ups:

Add "validation" guide to verify self-managed installation works camunda-docs#2623
Verify web modeler provides similarily (useful) UX (https://github.com/camunda/web-modeler/issues/6233)

Closing this issue. Let's handle further improvements in individual follow-ups.

upgradingdave added the enhancement New feature or request label Aug 28, 2023

nikku added the spring cleaning Could be cleaned up one day label Aug 29, 2023

nikku added ux backlog Queued in backlog labels Aug 29, 2023

nikku added the ready Ready to be worked on label Aug 29, 2023 — with bpmn-io-tasks

nikku removed the backlog Queued in backlog label Aug 29, 2023

CatalinaMoisuc assigned nikku Sep 19, 2023

nikku mentioned this issue Sep 20, 2023

Drop miss-leading audience hint in deploy overlay (C8 self-managed) #3864

Closed

nikku mentioned this issue Sep 20, 2023

feat: add test-connection.sh script camunda/zeebe-connection-test#10

Merged

nikku mentioned this issue Sep 20, 2023

chore: make OAuth URL more explict #3868

Merged

This was referenced Sep 21, 2023

Add "validation" guide to verify self-managed installation works camunda/camunda-docs#2623

Open

feat(zeebe-api): infer error codes (where unavailable) from message #3873

Merged

nikku added this to the M69 milestone Sep 22, 2023

nikku added in progress Currently worked on and removed ready Ready to be worked on labels Sep 22, 2023

nikku closed this as completed Sep 22, 2023

bpmn-io-tasks bot removed the in progress Currently worked on label Sep 22, 2023

nikku mentioned this issue Sep 25, 2023

Release camunda-modeler v5.16.0 #3846

Closed

36 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show specific error messages in the UI related to connection errors #3808

Show specific error messages in the UI related to connection errors #3808

upgradingdave commented Aug 28, 2023 •

edited

Loading

nikku commented Aug 28, 2023 •

edited

Loading

upgradingdave commented Aug 28, 2023 •

edited

Loading

nikku commented Aug 29, 2023

jessesimpson36 commented Aug 30, 2023

nikku commented Aug 31, 2023

jessesimpson36 commented Sep 11, 2023 •

edited

Loading

nikku commented Sep 12, 2023 •

edited

Loading

jessesimpson36 commented Sep 14, 2023

nikku commented Sep 14, 2023

jessesimpson36 commented Sep 18, 2023

nikku commented Sep 19, 2023

CatalinaMoisuc commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

nikku commented Sep 19, 2023 •

edited

Loading

nikku commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

nikku commented Sep 20, 2023 •

edited

Loading

nikku commented Sep 20, 2023

nikku commented Sep 20, 2023

nikku commented Sep 20, 2023 •

edited

Loading

barmac commented Sep 20, 2023

nikku commented Sep 20, 2023

nikku commented Sep 21, 2023 •

edited

Loading

nikku commented Sep 22, 2023 •

edited

Loading

Show specific error messages in the UI related to connection errors #3808

Show specific error messages in the UI related to connection errors #3808

Comments

upgradingdave commented Aug 28, 2023 • edited Loading

Problem you would like to solve

Proposed solution

Alternatives considered

Additional context

nikku commented Aug 28, 2023 • edited Loading

upgradingdave commented Aug 28, 2023 • edited Loading

nikku commented Aug 29, 2023

jessesimpson36 commented Aug 30, 2023

nikku commented Aug 31, 2023

jessesimpson36 commented Sep 11, 2023 • edited Loading

So what am I doing that it takes so long for me to debug?

nikku commented Sep 12, 2023 • edited Loading

jessesimpson36 commented Sep 14, 2023

nikku commented Sep 14, 2023

jessesimpson36 commented Sep 18, 2023

nikku commented Sep 19, 2023

CatalinaMoisuc commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

nikku commented Sep 19, 2023 • edited Loading

nikku commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

jessesimpson36 commented Sep 19, 2023

nikku commented Sep 20, 2023 • edited Loading

nikku commented Sep 20, 2023

nikku commented Sep 20, 2023

nikku commented Sep 20, 2023 • edited Loading

barmac commented Sep 20, 2023

nikku commented Sep 20, 2023

nikku commented Sep 21, 2023 • edited Loading

nikku commented Sep 22, 2023 • edited Loading

upgradingdave commented Aug 28, 2023 •

edited

Loading

nikku commented Aug 28, 2023 •

edited

Loading

upgradingdave commented Aug 28, 2023 •

edited

Loading

jessesimpson36 commented Sep 11, 2023 •

edited

Loading

nikku commented Sep 12, 2023 •

edited

Loading

nikku commented Sep 19, 2023 •

edited

Loading

nikku commented Sep 20, 2023 •

edited

Loading

nikku commented Sep 20, 2023 •

edited

Loading

nikku commented Sep 21, 2023 •

edited

Loading

nikku commented Sep 22, 2023 •

edited

Loading