Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KFLUXBUGS-863: introduce results fixes with requisite performance related tuning (dev/stage) #3884

Conversation

gabemontero
Copy link
Contributor

@gabemontero gabemontero commented Jun 14, 2024

changes from #3883 plus desired tuning to go with this results change

Replacement from attempt back with #3616

On the tekton side this is https://issues.redhat.com/browse/SRVKP-4347

@enarha @khrm @savitaashture FYI

Copy link

openshift-ci bot commented Jun 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gabemontero gabemontero force-pushed the retry-results-watcher-mem-leak-fix branch from 3f58397 to c3bffb8 Compare June 14, 2024 21:16
@gabemontero
Copy link
Contributor Author

pipeline service is not coming up ... will need to run this branch on a local cluster

/hold

…ated tuning (dev/stage)

rh-pre-commit.version: 2.3.0
rh-pre-commit.check-secrets: ENABLED
@gabemontero gabemontero force-pushed the retry-results-watcher-mem-leak-fix branch from c3bffb8 to 134f999 Compare June 15, 2024 12:08
@gabemontero
Copy link
Contributor Author

/hold cancel

needed to undo image pinning on infra deployments side

Copy link

openshift-ci bot commented Jun 15, 2024

@gabemontero: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/appstudio-hac-e2e-tests 134f999 link false /test appstudio-hac-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@gabemontero gabemontero requested review from enarha and removed request for xinredhat and ramessesii2 June 15, 2024 14:52
@gabemontero
Copy link
Contributor Author

The hac test failures were unrelated to results.

In fact, the results based query worked:

         cy:fetch ➟  GET https://ee-kqy6rsl7.apps.crc-eph.r9lp.p1.openshiftapps.com/api/k8s/plugins/tekton-results/workspaces/user1/apis/results.tekton.dev/v1alpha2/parents/user1-tenant/results/-/records?order_by=create_time+desc&page_size=30&filter=data_type+%3D%3D+%22tekton.dev%2Fv1beta1.PipelineRun%22+%26%26+data.metadata.labels%5B%22appstudio.openshift.io%2Fapplication%22%5D+%3D%3D+%22test-app-171845493%22+%26%26+data.metadata.labels%5B%22pipelines.appstudio.openshift.io%2Ftype%22%5D+%3D%3D+%22build%22+%26%26+data.metadata.labels%5B%22appstudio.openshift.io%2Fcomponent%22%5D+in+%5B%22java-quarkus-171845493%22%5D
                    Status: 200

Also, all the calls in the api server looked clean. https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/redhat-appstudio_infra-deployments/3884/pull-ci-redhat-appstudio-infra-deployments-main-appstudio-hac-e2e-tests/1801949874850107392/artifacts/appstudio-hac-e2e-tests/gather-extra/artifacts/pods/tekton-results_tekton-results-api-5598d57bb7-84z24_api.log

The test did not run long enough for pruning to kick in.

I think this is ready to merge into stage so I can run some performance testing, but I'll first wait to see if @enarha has cycles to review during his Sunday.

value: 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been backing off of multiple replicas since some odd behavior I saw during the March outage.

production is already back to 1 replica

- behavior: merge
name: config-observability
literals:
- profiling.enable="true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we can get thread dumps if need be .... safe to run in prod per https://go.dev/doc/diagnostics

will not be turning on cpu profiling etc.

I see us turning on profiling for as many tekton controllers as possible in konflux as part of app sre reqs wrt thread dumps

value: -update_log_timeout
- op: add
path: /spec/template/spec/containers/0/args/-
value: "9m"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on performance testing with @pmacik

value: -dynamic_reconcile_timeout
- op: add
path: /spec/template/spec/containers/0/args/-
value: "9m"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on performance testing with @pmacik

@enarha
Copy link
Contributor

enarha commented Jun 17, 2024

lgtm

@openshift-merge-bot openshift-merge-bot bot merged commit 4e63f61 into redhat-appstudio:main Jun 17, 2024
6 of 7 checks passed
@gabemontero gabemontero deleted the retry-results-watcher-mem-leak-fix branch June 17, 2024 13:58
pmacik pushed a commit to pmacik-testing/infra-deployments that referenced this pull request Jun 19, 2024
…ated tuning (dev/stage) (redhat-appstudio#3884)

* update components/monitoring/grafana/base/dashboards/pipeline-service/kustomization.yaml

* update components/pipeline-service/development/kustomization.yaml

* update components/pipeline-service/staging/base/kustomization.yaml

* update components/pipeline-service/staging/stone-stage-p01/deploy.yaml

* update components/pipeline-service/staging/stone-stg-m01/deploy.yaml

* update components/pipeline-service/staging/stone-stg-rh01/deploy.yaml

* KFLUXBUGS-863: introduce results fixes with requisite performance related tuning (dev/stage)

rh-pre-commit.version: 2.3.0
rh-pre-commit.check-secrets: ENABLED

---------

Co-authored-by: rh-tap-build-team[bot] <127938674+rh-tap-build-team[bot]@users.noreply.github.com>
pmacik pushed a commit to pmacik-testing/infra-deployments that referenced this pull request Jun 20, 2024
…ated tuning (dev/stage) (redhat-appstudio#3884)

* update components/monitoring/grafana/base/dashboards/pipeline-service/kustomization.yaml

* update components/pipeline-service/development/kustomization.yaml

* update components/pipeline-service/staging/base/kustomization.yaml

* update components/pipeline-service/staging/stone-stage-p01/deploy.yaml

* update components/pipeline-service/staging/stone-stg-m01/deploy.yaml

* update components/pipeline-service/staging/stone-stg-rh01/deploy.yaml

* KFLUXBUGS-863: introduce results fixes with requisite performance related tuning (dev/stage)

rh-pre-commit.version: 2.3.0
rh-pre-commit.check-secrets: ENABLED

---------

Co-authored-by: rh-tap-build-team[bot] <127938674+rh-tap-build-team[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants