Skip to content

v2.4.0

Compare
Choose a tag to compare
@cliffcolvin cliffcolvin released this 17 Sep 01:13
· 80 commits to develop since this release
b98b245

Overview

Version 2.4 is a ‘edge’ release focused on GPU efficiency, many bug fixes, as well as several quality of life improvements.

Important Notices

  1. Upgrading to 2.4 will add new fields to the Kubecost ETL that support the GPU monitoring features. The new ETL files are not backward compatible with previous versions. Multi-cluster users MUST upgrade the primary before upgrading secondary (agents).

The current 2.3.x release is considered stable and will continue to be maintained. Kubecost will release a new 2.3.x version that is compatible with the ETL changes in 2.4.x that will allow downgrading to that version from 2.4.x. All this said, the 2.4.0 release has been extensively tested and we recommend upgrading to take advantage of the new features and significant number of bug/CVE fixes.

  1. An agent upgrade to version 2.4+ is required to gather the additional metrics for NVIDIA GPU workloads. If NVIDIA GPUs are not used, the agent upgrade is not required.

Major Features

  • [Feature] Incorporate GPU Efficiency into efficiency metrics displayed around the application.
  • [Feature] Ability to rightsize node groups in cluster-sizing. Note that this requires that the agents(secondaries) must be at or above 1.100, which added support for node labels
  • [Feature] Add options to the Allocations page to see Idle costs broken down per-node and per-cluster.
  • [Feature] Add support for Collections Budgets.
  • [Feature] Add support for Idle Costs to Collections.

Minor Features

  • [Feature] Add support for new setting in helm to enable standard discount to be applied in kubecost primary cluster installation that applies to data coming from secondary clusters.
  • [Feature] Add support for certificates when using a custom SMTP server with Kubecost.
  • [Feature] Add new FOCUS spec fields to Cloud Cost to support Account Name, Invoice Entity Name, Region ID, and Availability Zone.
  • [Feature] Add the ability to support BYO certificates for SMTP integration.
  • [Feature] Add a check in the Settings page which alerts users when their Helm Chart, UI image, and API image versions are not in sync.
  • [Feature] Add four new fields from the FOCUS spec to Cloud Costs.
  • [Feature] Add four new Fields from the FOCUS spec to Cloud Budgets.
  • [Feature] Add limited support for feature-flagging via the Helm chart.
  • [Feature] Agent diagnostics is now enabled by default
  • [Enhancement] Substantial application-wide improvements to WCAG 2.1 AA accessibility.
  • [Enhancement] Add a loading indicator when downloading request sizing CSVs to show that the download has, in fact, been initiated.
  • [Enhancement] Add a loading indicator when request sizing data is refreshing.
  • [Enhancement] Remove the “New” badges from pages that were introduced in 2.0.
  • [Enhancement] Default Request Sizing window to 3d instead of 48h. Using 48 data points was causing the page to hang or crash for some larger data sets.
  • [Enhancement] When an array of empty data is returned from the custom costs API, show an informative message rather than and empty graph/table.
  • [Enhancement] Show an informative message when the Request Sizing API returns a response with an empty set of Recommendations.
  • [Enhancement] Show an informative message when attempting to create a Budget fails.
  • [Enhancement] More information in bug reports.
  • [Enhancement] Show a more informative error response when cluster sizing recommendations cannot be generated due to not finding cloud provider information for a cluster.
  • [Enhancement] Show friendlier Cloud Account Names in Overview / Cloud Cost tables instead of Cloud Account IDs, when names are available.
  • [Enhancement] Add the ability to see aggregator PV usage in /diagnostics page.

Fixes

  • [Fix] Add a new script for copying alerts to the aggregator pod from cost-model as we moved this endpoint over. If you have alerts configured prior to 2.4, you’ll need to run this script upon upgrading.
  • [Fix] Fix an issue where overview cluster efficiency shows usage as 0.
  • [Fix] Fix an issue where resource hourly cost is incorrectly calculated on drill down.
  • [Fix] Fix an issue when changing from separate idle by node to another idle configuration.
  • [Fix] Fix an issue with GPU idle calculations in allocation.
  • [Fix] Fix assets that appear to be missing account ID.
  • [Fix] Fix an issue causing discrepancies in collections cost in the k8s domain for query windows that yield relative date boundaries.
  • [Fix] Fix csv pricing for gpus not correctly reflecting in kubecost.
  • [Fix] Fix an issue with the Allocation API not matching Allocation Summary API on costs.
  • [Fix] Fix an http 500 error in cluster right-sizing.
  • [Fix] Fix an issue with Allocation API calculation on PV costs.
  • [Fix] Fix an issue with Allocation API and Allocation Summary API cost accuracy when cost metric is not set to cumulative cost.
  • [Fix] Fix an issue with AKS reconciliation of BRL currency costs.
  • [Fix] Fix an issue with Asset budgets using the ‘Project” workload type.
  • [Fix] Fix several issue with /clusters page, issues causing inaccurate provider selection, as well as costs.
  • [Fix] Fix aws:eks:cluster-name tag not being picked up.
  • [Fix] Fix an issue causing inflated network costs for Azure clusters.
  • [Fix] Fix an issue where HA and DR icons are not working properly on /settings page.
  • [Fix] Fix an issue with Carbon Costs and Trends getting HTTP 500 in allocations.
  • [Fix] Fix issue in orphaned resources API causing a 500 error on a single resource lookup failure from provider.
  • [Fix] Fix issue in allocations presenting non zero shared costs when sharing is disabled.
  • [Fix] Fix the scalability of the clusters API for accuracy and speed.
  • [Fix] Better error handling in some cases where the app fails to start. Allow users to enter a license key or start/extend an Enterprise trial when blocked on license violations.
  • [Fix] Update math in the Overview’s efficiency graph card so as not to show negative allocation, which is impossible.
  • [Fix] Remove the Category filter from Asset Budget filter options, as it is unsupported.
  • [Fix] Prevent drilling into Pod items in the Efficiency page. Previously, this would set the aggregation to Namespace and remove all filters.
  • [Fix] Request Sizing had two separate UI elements for setting Filters. The one in the Customize menu has been removed.
  • [Fix] Remove an unnecessary check for the presence of the Network Cost daemonset on the primary cluster before rendering the Network Costs page. Secondary clusters may be reporting network costs that can be viewed from this page, regardless of the state of the daemonset on the primary.
  • [Fix] Prevent querying for data older than the 15 day retention period for Free tier in the Collections and Efficiency pages.
  • [Fix] Correctly generate links from the Allocations page to the Request Right Sizing page when filtering and/or aggregating by custom label.
  • [Fix] Correct an error that resulted from savings Cloud Cost reports with custom labels.
  • [Fix] Correct a broken link to the Efficiency Report documentation.
  • [Fix] Fix a bug in Assets where updating the Cost Metric field would remove any applied filters.
  • [Fix] Fix an issue where step size was not honored in Efficiency Reports.
  • [Fix] Fix a variety of issues in the Allocation Detail Modal (shown when clicking on a Pod row). This modal would issue an incorrect and expensive Assets query to try to derive the Pod’s Node. When it failed, it would show a cryptic message about credentials.
  • [Fix] Fix a bug that caused the Clusters list to filter incorrectly.
  • [Fix] Remove the unallocated item from the Overview’s Namespace Breakdown table.
  • [Fix] Fix an issue where sometimes applying a license would hide the current active Free Enterprise Trial status and vice-versa. The settings page now always shows both the active license and the state of an installations free trial.
  • [Fix] Fix an issue where custom SMTP tests/updates from the UI could fail.
  • [Fix] Fix Alerts only alerting on data from the Primary cluster. All alerts except Cluster/Application Health alerts will leverage data from secondary clusters.
  • [Fix] Don’t try to show all per-day cluster costs in the Overview page. Show top 10 like we do in other graphs.
  • [Fix] Fix an issue where UI-created Budgets that reset on Sunday did not create correctly.
  • [Fix] Fix an issue where the UI could send an incorrect parameter to the Cluster Sizing API.
  • [Fix] Fix an issue with Assets monthly totals not appropriately lining up.
  • [Fix] Fix category options in asset autocomplete.
  • [Fix] Fix an issue where namespace turndown always shows the next run as ‘coming soon’.
  • [Fix] Fix alerts to be multi-cluster aware.
  • [Fix] Fix missing claim names in persistent volume sizing.
  • [Fix] Fix the default experience for cluster right sizing when current daily data isn’t yet available.
  • [Fix] Fix an inaccuracy in pod costs on abandoned workloads savings page.
  • [Fix] Fix an issue where the cluster provider name could be incorrect on the clusters page.
  • [Fix] Fix an issue where total and page count on container right-sizing page had values when no recommendations were available.
  • [Fix] Fix an issue where database timestamps weren’t being correctly set for some data, defaulting to Jan 1st 1970.
  • [Fix] Fix an issue with PV discrepancy between allocation and allocation summary API.
  • [Fix] Fix an issue with saving SMTP configuration after edits.
  • [Fix] Fix an issue where aggregator can run out of pv space and no warnings to the frontend are available.
  • [Fix] Fix an issue where shared costs do not show correctly in the top level allocations view.
  • [Fix] Fix an issue where node counts don’t match across allocation, assets, and cluster inspect.
  • [Fix] Fix an issue where allocation API does not match the allocation summary API.

Helm Changes

  • #3500 No duplicate labels
  • #3459 Create checksum for configmaps and secrets
  • #3510 Add EKS 1.30
  • #3511 Set StorageClass on Prom PV
  • #3449 Document ASSET_INCLUDE_LOCAL_DISK_COST
  • #3543 Update the grafana dashboard to reduce the uid below the max limit of 40 characters.
  • #3539 Add support for collections idle costs.
  • #3538 Add extraScrapeConfig to scrape DCGM Exporter for gpu efficiency information.
  • #3551 Add comments in values.yaml for Aggregator and ETLUtils.
  • #3516 Add options to hide ui elements.
  • #3561 Add comments to grafana ingress to avoid confusion.
  • #3570 Add nginx routing for alerts to aggregator pod.
  • #3581 Add node group right sizing endpoints to aggregator.
  • #3411 Add support for supplying federated storage via YAML values.
  • #3589 Add scheduled reports test endpoint routes.
  • #3605 Setup InstanceAllowLists ConfigMap.
  • #3635 Add routes for savings recommendations allow lists.
  • #3648 Add temporary directory mount for new base image on frontend.
  • #3661 Enhance Pod Utilization dashboard with GPU utilization widget.
  • #3647 Added additional tmp dir to frontend container
  • #3670 Add custom label template for aggregator service

Helm Fixes

  • #3490 Fix carbonEstimates typo
  • #3492 Fix links in comments
  • #3505 Fix units used in duckdb memory limits
  • #3575 Fix issue with SMTP secret causing aggregator pod to fail to start.
  • #3569 Fix oidc redirect loop.

Dependency Updates

  • #3625 Move from quay.io/prom/prometheus:v2.52.0 to cgr.dev/chainguard/prometheus:latest
  • #3627 Move from grafana/grafana:10.4.3 to cgr.dev/chainguard/grafana:latest
  • #3629 Bump kubecost-network-costs from v0.17.3 to v0.17.6
  • #3606 Bump kubecost-modeling from v0.1.12 to v0.1.16
  • #3461 Bump prom/pushgateway from v1.8.0 to v1.9.0
  • #3545 Bump prom/node-exporter from v1.8.0 to v1.8.2
  • #3544 Bump kiwigrid/k8s-sidecar from 1.27.2 to 1.27.5
  • #3487 Bump cluster-controller from 0.16.1 to 0.16.9
Helm Chart Comparison Report kubecost/[email protected] to kubecost/[email protected] ### CVE by Severity | Severity | Count | Prev Count | Difference | |----------|-------|------------|------------| | critical | 0 | 8 | -8 | | high | 0 | 7 | -7 | | medium | 21 | 109 | -88 | | low | 81 | 435 | -354 |

Images

Image Name Status Before Repo After Repo Before Tag After Tag
cost-model Changed gcr.io/kubecost1 gcr.io/kubecost1 prod-2.3.5 prod-2.4.0
frontend Changed gcr.io/kubecost1 gcr.io/kubecost1 prod-2.3.5 prod-2.4.0
kubecost-modeling Changed gcr.io/kubecost1 gcr.io/kubecost1 v0.1.15 v0.1.16
k8s-sidecar Changed kiwigrid cgr.dev/chainguard 1.27.2 latest
grafana Changed grafana cgr.dev/chainguard 11.1.4 latest
prometheus Changed quay.io/prometheus cgr.dev/chainguard v2.52.0 latest
k8s Unchanged alpine alpine 1.26.9 1.26.9

Unchanged CVEs

Medium

CVE ID Severity Affected Images
CVE-2021-3997 medium cost-model
CVE-2023-30571 medium cost-model
CVE-2024-2236 medium cost-model
CVE-2024-26462 medium cost-model
CVE-2024-34397 medium cost-model
CVE-2024-35325 medium cost-model
CVE-2024-6119 medium cost-model

Low

CVE ID Severity Affected Images
CVE-2022-27943 low cost-model
CVE-2022-29458 low cost-model
CVE-2022-3219 low cost-model
CVE-2022-41409 low cost-model
CVE-2022-4899 low cost-model
CVE-2023-2953 low cost-model
CVE-2023-32636 low cost-model
CVE-2023-3446 low cost-model
CVE-2023-36191 low cost-model
CVE-2023-37920 low cost-model
CVE-2023-3817 low cost-model
CVE-2023-4156 low cost-model
CVE-2023-45322 low cost-model
CVE-2023-45918 low cost-model
CVE-2023-50495 low cost-model
CVE-2023-5678 low cost-model
CVE-2023-6129 low cost-model
CVE-2023-6237 low cost-model
CVE-2024-0232 low cost-model
CVE-2024-2511 low cost-model
CVE-2024-26458 low cost-model
CVE-2024-26461 low cost-model
CVE-2024-34459 low cost-model
CVE-2024-4603 low cost-model
CVE-2024-4741 low cost-model
CVE-2024-5535 low cost-model
CVE-2024-7264 low cost-model

Added CVEs

CVE ID Severity Affected Images

Removed CVEs

Critical

CVE ID Severity Affected Images
CVE-2024-24790 critical prometheus
CVE-2024-41110 critical prometheus
CVE-2024-45490 critical cost-model, frontend, k8s-sidecar
CVE-2024-45491 critical cost-model, frontend, k8s-sidecar
CVE-2024-45492 critical cost-model, frontend, k8s-sidecar

High

CVE ID Severity Affected Images
CVE-2024-34156 high grafana, prometheus, cost-model
CVE-2024-6232 high cost-model, kubecost-modeling
CVE-2024-6345 high k8s-sidecar

Medium

CVE ID Severity Affected Images
CVE-2005-2541 medium cost-model
CVE-2021-23336 medium cost-model
CVE-2021-45940 medium cost-model
CVE-2021-45941 medium cost-model
CVE-2023-36632 medium cost-model
CVE-2023-42363 medium k8s-sidecar, grafana
CVE-2023-42364 medium k8s-sidecar, grafana
CVE-2023-42365 medium grafana, k8s-sidecar
CVE-2023-42366 medium k8s-sidecar, grafana
CVE-2024-24789 medium prometheus
CVE-2024-24791 medium grafana, prometheus
CVE-2024-29040 medium cost-model
CVE-2024-32473 medium prometheus
CVE-2024-34155 medium cost-model, grafana, prometheus
CVE-2024-34158 medium cost-model, grafana, prometheus
CVE-2024-35195 medium cost-model
CVE-2024-35255 medium prometheus, grafana
CVE-2024-37370 medium cost-model
CVE-2024-37371 medium cost-model
CVE-2024-37891 medium cost-model, k8s-sidecar
CVE-2024-43790 medium cost-model
CVE-2024-4603 medium k8s-sidecar, grafana
CVE-2024-4741 medium k8s-sidecar, grafana
CVE-2024-5535 medium k8s-sidecar, grafana
CVE-2024-6104 medium prometheus
CVE-2024-6119 medium k8s-sidecar, grafana, frontend, kubecost-modeling
CVE-2024-6923 medium cost-model
CVE-2024-8088 medium cost-model
GHSA-mh55-gqvf-xfwm medium grafana

Low

CVE ID Severity Affected Images
CVE-2020-20703 low cost-model
CVE-2021-3572 low cost-model
CVE-2021-3903 low cost-model
CVE-2021-3927 low cost-model
CVE-2021-3928 low cost-model
CVE-2021-3968 low cost-model
CVE-2021-3973 low cost-model
CVE-2021-3974 low cost-model
CVE-2021-4136 low cost-model
CVE-2021-4166 low cost-model
CVE-2021-4173 low cost-model
CVE-2021-4187 low cost-model
CVE-2022-0213 low cost-model
CVE-2022-0351 low cost-model
CVE-2022-1616 low cost-model
CVE-2022-1619 low cost-model
CVE-2022-1620 low cost-model
CVE-2022-1674 low cost-model
CVE-2022-1720 low cost-model
CVE-2022-1725 low cost-model
CVE-2022-2042 low cost-model
CVE-2022-2124 low cost-model
CVE-2022-2125 low cost-model
CVE-2022-2126 low cost-model
CVE-2022-2129 low cost-model
CVE-2022-2175 low cost-model
CVE-2022-2182 low cost-model
CVE-2022-2183 low cost-model
CVE-2022-2206 low cost-model
CVE-2022-2207 low cost-model
CVE-2022-2208 low cost-model
CVE-2022-2210 low cost-model
CVE-2022-2257 low cost-model
CVE-2022-2284 low cost-model
CVE-2022-2285 low cost-model
CVE-2022-2286 low cost-model
CVE-2022-2287 low cost-model
CVE-2022-2304 low cost-model
CVE-2022-2343 low cost-model
CVE-2022-2344 low cost-model
CVE-2022-2345 low cost-model
CVE-2022-2522 low cost-model
CVE-2022-2817 low cost-model
CVE-2022-2819 low cost-model
CVE-2022-2845 low cost-model
CVE-2022-2849 low cost-model
CVE-2022-2862 low cost-model
CVE-2022-2874 low cost-model
CVE-2022-2889 low cost-model
CVE-2022-2923 low cost-model
CVE-2022-2946 low cost-model
CVE-2022-2980 low cost-model
CVE-2022-2982 low cost-model
CVE-2022-3016 low cost-model
CVE-2022-3037 low cost-model
CVE-2022-3099 low cost-model
CVE-2022-3134 low cost-model
CVE-2022-3153 low cost-model
CVE-2022-3234 low cost-model
CVE-2022-3235 low cost-model
CVE-2022-3256 low cost-model
CVE-2022-3278 low cost-model
CVE-2022-3296 low cost-model
CVE-2022-3297 low cost-model
CVE-2022-3324 low cost-model
CVE-2022-3352 low cost-model
CVE-2022-3606 low cost-model
CVE-2022-3705 low cost-model
CVE-2022-4141 low cost-model
CVE-2022-4292 low cost-model
CVE-2022-4293 low cost-model
CVE-2022-47007 low cost-model
CVE-2022-47010 low cost-model
CVE-2022-47011 low cost-model
CVE-2023-0049 low cost-model
CVE-2023-0051 low cost-model
CVE-2023-0054 low cost-model
CVE-2023-0288 low cost-model
CVE-2023-0433 low cost-model
CVE-2023-0512 low cost-model
CVE-2023-1127 low cost-model
CVE-2023-1170 low cost-model
CVE-2023-1175 low cost-model
CVE-2023-1264 low cost-model
CVE-2023-2609 low cost-model
CVE-2023-2610 low cost-model
CVE-2023-39804 low cost-model
CVE-2023-46246 low cost-model
CVE-2023-4733 low cost-model
CVE-2023-4734 low cost-model
CVE-2023-4735 low cost-model
CVE-2023-4738 low cost-model
CVE-2023-4750 low cost-model
CVE-2023-4751 low cost-model
CVE-2023-4752 low cost-model
CVE-2023-4781 low cost-model
CVE-2023-48231 low cost-model
CVE-2023-48232 low cost-model
CVE-2023-48233 low cost-model
CVE-2023-48234 low cost-model
CVE-2023-48235 low cost-model
CVE-2023-48236 low cost-model
CVE-2023-48237 low cost-model
CVE-2023-48706 low cost-model
CVE-2023-5344 low cost-model
CVE-2023-5441 low cost-model
CVE-2023-5535 low cost-model
CVE-2024-0397 low cost-model
CVE-2024-22667 low cost-model
CVE-2024-2511 low grafana
CVE-2024-25260 low cost-model
CVE-2024-39689 low k8s-sidecar
CVE-2024-41957 low cost-model
CVE-2024-41965 low cost-model
CVE-2024-43374 low cost-model
CVE-2024-43802 low cost-model
CVE-2024-45306 low cost-model
CVE-2024-7592 low cost-model
GHSA-xr7q-jx4m-x55m low cost-model, grafana