Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

Add troubleshooting step: logs --all-containers #123

Closed
1 task
christian-kreuzberger-dtx opened this issue Dec 17, 2021 · 2 comments
Closed
1 task

Add troubleshooting step: logs --all-containers #123

christian-kreuzberger-dtx opened this issue Dec 17, 2021 · 2 comments
Labels
type:feature New feature or request that provides value to the stakeholders/end-users

Comments

@christian-kreuzberger-dtx
Copy link
Contributor

During trying out job-executor I ended up getting this error in Bridge:

Job job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1 failed: job job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1 failed. Reason: BackoffLimitExceeded, Message: Job has reached the specified backoff limit

Unfortunately this does not tell me why the job has failed, so I had to start troubleshooting on my own.
A very handy troubleshooting step is to look at Kubernetes logs, especially the log of the job or the pod(s).

For this the following steps need to be done:

$ kubectl -n keptn get pods
...
job-executor-service-799fc875c7-hjbjw                           2/2     Running      0          19h
job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1-qdswm   0/1     Init:Error   0          18m
job-executor-service-job-f5f44383-054d-48d0-b219-e2d6-1-p7bbl   0/1     Error        0          19h
job-executor-service-job-ff152cae-1238-4249-b5e3-67fd-1-h2vmc   0/1     Init:Error   0          24m
...

Select the pod that you want to inspect, e.g. the most recent one: job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1-qdswm

$ kubectl -n keptn logs job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1-qdswm --all-containers

And look at the output, e.g.:

kubectl -n keptn logs job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1-qdswm -c init-job-executor-service-job-855e19ed-baed-48b3-973e-0e62-1
2021/12/17 10:31:18 Error while copying files: could not find file or directory jmeter/basiccheck.jmx for task Run jmeter smoke tests

Definition of Done

  • Add a troubleshooting section with this as the first troubleshooting step
@christian-kreuzberger-dtx christian-kreuzberger-dtx added the type:feature New feature or request that provides value to the stakeholders/end-users label Dec 17, 2021
@grabnerandi
Copy link

I just ran into the same issue and it took me a while to do some troubleshooting
Would be great to provide those types of messages to the end user as the fix is simple - but - getting to the actual reason is not

@christian-kreuzberger-dtx
Copy link
Contributor Author

This is also addressed using #214, so we don't need to add it to the troubleshooting section.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type:feature New feature or request that provides value to the stakeholders/end-users
Projects
None yet
Development

No branches or pull requests

2 participants