Overview
Version 2.4 is a ‘edge’ release focused on GPU efficiency, many bug fixes, as well as several quality of life improvements.
Important Notices
- Upgrading to 2.4 will add new fields to the Kubecost ETL that support the GPU monitoring features. The new ETL files are not backward compatible with previous versions. Multi-cluster users MUST upgrade the primary before upgrading secondary (agents).
The current 2.3.x release is considered stable and will continue to be maintained. Kubecost will release a new 2.3.x version that is compatible with the ETL changes in 2.4.x that will allow downgrading to that version from 2.4.x. All this said, the 2.4.0 release has been extensively tested and we recommend upgrading to take advantage of the new features and significant number of bug/CVE fixes.
- An agent upgrade to version 2.4+ is required to gather the additional metrics for NVIDIA GPU workloads. If NVIDIA GPUs are not used, the agent upgrade is not required.
Major Features
- [Feature] Incorporate GPU Efficiency into efficiency metrics displayed around the application.
- [Feature] Ability to rightsize node groups in cluster-sizing. Note that this requires that the agents(secondaries) must be at or above 1.100, which added support for node labels
- [Feature] Add options to the Allocations page to see Idle costs broken down per-node and per-cluster.
- [Feature] Add support for Collections Budgets.
- [Feature] Add support for Idle Costs to Collections.
Minor Features
- [Feature] Add support for new setting in helm to enable standard discount to be applied in kubecost primary cluster installation that applies to data coming from secondary clusters.
- [Feature] Add support for certificates when using a custom SMTP server with Kubecost.
- [Feature] Add new FOCUS spec fields to Cloud Cost to support Account Name, Invoice Entity Name, Region ID, and Availability Zone.
- [Feature] Add the ability to support BYO certificates for SMTP integration.
- [Feature] Add a check in the Settings page which alerts users when their Helm Chart, UI image, and API image versions are not in sync.
- [Feature] Add four new fields from the FOCUS spec to Cloud Costs.
- [Feature] Add four new Fields from the FOCUS spec to Cloud Budgets.
- [Feature] Add limited support for feature-flagging via the Helm chart.
- [Feature] Agent diagnostics is now enabled by default
- [Enhancement] Substantial application-wide improvements to WCAG 2.1 AA accessibility.
- [Enhancement] Add a loading indicator when downloading request sizing CSVs to show that the download has, in fact, been initiated.
- [Enhancement] Add a loading indicator when request sizing data is refreshing.
- [Enhancement] Remove the “New” badges from pages that were introduced in 2.0.
- [Enhancement] Default Request Sizing window to 3d instead of 48h. Using 48 data points was causing the page to hang or crash for some larger data sets.
- [Enhancement] When an array of empty data is returned from the custom costs API, show an informative message rather than and empty graph/table.
- [Enhancement] Show an informative message when the Request Sizing API returns a response with an empty set of Recommendations.
- [Enhancement] Show an informative message when attempting to create a Budget fails.
- [Enhancement] More information in bug reports.
- [Enhancement] Show a more informative error response when cluster sizing recommendations cannot be generated due to not finding cloud provider information for a cluster.
- [Enhancement] Show friendlier Cloud Account Names in Overview / Cloud Cost tables instead of Cloud Account IDs, when names are available.
- [Enhancement] Add the ability to see aggregator PV usage in /diagnostics page.
Fixes
- [Fix] Add a new script for copying alerts to the aggregator pod from cost-model as we moved this endpoint over. If you have alerts configured prior to 2.4, you’ll need to run this script upon upgrading.
- [Fix] Fix an issue where overview cluster efficiency shows usage as 0.
- [Fix] Fix an issue where resource hourly cost is incorrectly calculated on drill down.
- [Fix] Fix an issue when changing from separate idle by node to another idle configuration.
- [Fix] Fix an issue with GPU idle calculations in allocation.
- [Fix] Fix assets that appear to be missing account ID.
- [Fix] Fix an issue causing discrepancies in collections cost in the k8s domain for query windows that yield relative date boundaries.
- [Fix] Fix csv pricing for gpus not correctly reflecting in kubecost.
- [Fix] Fix an issue with the Allocation API not matching Allocation Summary API on costs.
- [Fix] Fix an http 500 error in cluster right-sizing.
- [Fix] Fix an issue with Allocation API calculation on PV costs.
- [Fix] Fix an issue with Allocation API and Allocation Summary API cost accuracy when cost metric is not set to cumulative cost.
- [Fix] Fix an issue with AKS reconciliation of BRL currency costs.
- [Fix] Fix an issue with Asset budgets using the ‘Project” workload type.
- [Fix] Fix several issue with /clusters page, issues causing inaccurate provider selection, as well as costs.
- [Fix] Fix aws:eks:cluster-name tag not being picked up.
- [Fix] Fix an issue causing inflated network costs for Azure clusters.
- [Fix] Fix an issue where HA and DR icons are not working properly on /settings page.
- [Fix] Fix an issue with Carbon Costs and Trends getting HTTP 500 in allocations.
- [Fix] Fix issue in orphaned resources API causing a 500 error on a single resource lookup failure from provider.
- [Fix] Fix issue in allocations presenting non zero shared costs when sharing is disabled.
- [Fix] Fix the scalability of the clusters API for accuracy and speed.
- [Fix] Better error handling in some cases where the app fails to start. Allow users to enter a license key or start/extend an Enterprise trial when blocked on license violations.
- [Fix] Update math in the Overview’s efficiency graph card so as not to show negative allocation, which is impossible.
- [Fix] Remove the Category filter from Asset Budget filter options, as it is unsupported.
- [Fix] Prevent drilling into Pod items in the Efficiency page. Previously, this would set the aggregation to Namespace and remove all filters.
- [Fix] Request Sizing had two separate UI elements for setting Filters. The one in the Customize menu has been removed.
- [Fix] Remove an unnecessary check for the presence of the Network Cost daemonset on the primary cluster before rendering the Network Costs page. Secondary clusters may be reporting network costs that can be viewed from this page, regardless of the state of the daemonset on the primary.
- [Fix] Prevent querying for data older than the 15 day retention period for Free tier in the Collections and Efficiency pages.
- [Fix] Correctly generate links from the Allocations page to the Request Right Sizing page when filtering and/or aggregating by custom label.
- [Fix] Correct an error that resulted from savings Cloud Cost reports with custom labels.
- [Fix] Correct a broken link to the Efficiency Report documentation.
- [Fix] Fix a bug in Assets where updating the Cost Metric field would remove any applied filters.
- [Fix] Fix an issue where step size was not honored in Efficiency Reports.
- [Fix] Fix a variety of issues in the Allocation Detail Modal (shown when clicking on a Pod row). This modal would issue an incorrect and expensive Assets query to try to derive the Pod’s Node. When it failed, it would show a cryptic message about credentials.
- [Fix] Fix a bug that caused the Clusters list to filter incorrectly.
- [Fix] Remove the unallocated item from the Overview’s Namespace Breakdown table.
- [Fix] Fix an issue where sometimes applying a license would hide the current active Free Enterprise Trial status and vice-versa. The settings page now always shows both the active license and the state of an installations free trial.
- [Fix] Fix an issue where custom SMTP tests/updates from the UI could fail.
- [Fix] Fix Alerts only alerting on data from the Primary cluster. All alerts except Cluster/Application Health alerts will leverage data from secondary clusters.
- [Fix] Don’t try to show all per-day cluster costs in the Overview page. Show top 10 like we do in other graphs.
- [Fix] Fix an issue where UI-created Budgets that reset on Sunday did not create correctly.
- [Fix] Fix an issue where the UI could send an incorrect parameter to the Cluster Sizing API.
- [Fix] Fix an issue with Assets monthly totals not appropriately lining up.
- [Fix] Fix category options in asset autocomplete.
- [Fix] Fix an issue where namespace turndown always shows the next run as ‘coming soon’.
- [Fix] Fix alerts to be multi-cluster aware.
- [Fix] Fix missing claim names in persistent volume sizing.
- [Fix] Fix the default experience for cluster right sizing when current daily data isn’t yet available.
- [Fix] Fix an inaccuracy in pod costs on abandoned workloads savings page.
- [Fix] Fix an issue where the cluster provider name could be incorrect on the clusters page.
- [Fix] Fix an issue where total and page count on container right-sizing page had values when no recommendations were available.
- [Fix] Fix an issue where database timestamps weren’t being correctly set for some data, defaulting to Jan 1st 1970.
- [Fix] Fix an issue with PV discrepancy between allocation and allocation summary API.
- [Fix] Fix an issue with saving SMTP configuration after edits.
- [Fix] Fix an issue where aggregator can run out of pv space and no warnings to the frontend are available.
- [Fix] Fix an issue where shared costs do not show correctly in the top level allocations view.
- [Fix] Fix an issue where node counts don’t match across allocation, assets, and cluster inspect.
- [Fix] Fix an issue where allocation API does not match the allocation summary API.
Helm Changes
- #3500 No duplicate labels
- #3459 Create checksum for configmaps and secrets
- #3510 Add EKS 1.30
- #3511 Set StorageClass on Prom PV
- #3449 Document ASSET_INCLUDE_LOCAL_DISK_COST
- #3543 Update the grafana dashboard to reduce the uid below the max limit of 40 characters.
- #3539 Add support for collections idle costs.
- #3538 Add extraScrapeConfig to scrape DCGM Exporter for gpu efficiency information.
- #3551 Add comments in values.yaml for Aggregator and ETLUtils.
- #3516 Add options to hide ui elements.
- #3561 Add comments to grafana ingress to avoid confusion.
- #3570 Add nginx routing for alerts to aggregator pod.
- #3581 Add node group right sizing endpoints to aggregator.
- #3411 Add support for supplying federated storage via YAML values.
- #3589 Add scheduled reports test endpoint routes.
- #3605 Setup InstanceAllowLists ConfigMap.
- #3635 Add routes for savings recommendations allow lists.
- #3648 Add temporary directory mount for new base image on frontend.
- #3661 Enhance Pod Utilization dashboard with GPU utilization widget.
- #3647 Added additional tmp dir to frontend container
- #3670 Add custom label template for aggregator service
Helm Fixes
- #3490 Fix carbonEstimates typo
- #3492 Fix links in comments
- #3505 Fix units used in duckdb memory limits
- #3575 Fix issue with SMTP secret causing aggregator pod to fail to start.
- #3569 Fix oidc redirect loop.
Dependency Updates
- #3625 Move from quay.io/prom/prometheus:v2.52.0 to cgr.dev/chainguard/prometheus:latest
- #3627 Move from grafana/grafana:10.4.3 to cgr.dev/chainguard/grafana:latest
- #3629 Bump kubecost-network-costs from v0.17.3 to v0.17.6
- #3606 Bump kubecost-modeling from v0.1.12 to v0.1.16
- #3461 Bump prom/pushgateway from v1.8.0 to v1.9.0
- #3545 Bump prom/node-exporter from v1.8.0 to v1.8.2
- #3544 Bump kiwigrid/k8s-sidecar from 1.27.2 to 1.27.5
- #3487 Bump cluster-controller from 0.16.1 to 0.16.9
Helm Chart Comparison Report kubecost/[email protected] to kubecost/[email protected]
### CVE by Severity
| Severity | Count | Prev Count | Difference |
|----------|-------|------------|------------|
| critical | 0 | 8 | -8 |
| high | 0 | 7 | -7 |
| medium | 21 | 109 | -88 |
| low | 81 | 435 | -354 |
Images
Image Name |
Status |
Before Repo |
After Repo |
Before Tag |
After Tag |
cost-model |
Changed |
gcr.io/kubecost1 |
gcr.io/kubecost1 |
prod-2.3.5 |
prod-2.4.0 |
frontend |
Changed |
gcr.io/kubecost1 |
gcr.io/kubecost1 |
prod-2.3.5 |
prod-2.4.0 |
kubecost-modeling |
Changed |
gcr.io/kubecost1 |
gcr.io/kubecost1 |
v0.1.15 |
v0.1.16 |
k8s-sidecar |
Changed |
kiwigrid |
cgr.dev/chainguard |
1.27.2 |
latest |
grafana |
Changed |
grafana |
cgr.dev/chainguard |
11.1.4 |
latest |
prometheus |
Changed |
quay.io/prometheus |
cgr.dev/chainguard |
v2.52.0 |
latest |
k8s |
Unchanged |
alpine |
alpine |
1.26.9 |
1.26.9 |
Unchanged CVEs
Medium
Low
Added CVEs
CVE ID |
Severity |
Affected Images |
Removed CVEs
Critical
High
Medium
CVE ID |
Severity |
Affected Images |
CVE-2005-2541 |
medium |
cost-model |
CVE-2021-23336 |
medium |
cost-model |
CVE-2021-45940 |
medium |
cost-model |
CVE-2021-45941 |
medium |
cost-model |
CVE-2023-36632 |
medium |
cost-model |
CVE-2023-42363 |
medium |
k8s-sidecar, grafana |
CVE-2023-42364 |
medium |
k8s-sidecar, grafana |
CVE-2023-42365 |
medium |
grafana, k8s-sidecar |
CVE-2023-42366 |
medium |
k8s-sidecar, grafana |
CVE-2024-24789 |
medium |
prometheus |
CVE-2024-24791 |
medium |
grafana, prometheus |
CVE-2024-29040 |
medium |
cost-model |
CVE-2024-32473 |
medium |
prometheus |
CVE-2024-34155 |
medium |
cost-model, grafana, prometheus |
CVE-2024-34158 |
medium |
cost-model, grafana, prometheus |
CVE-2024-35195 |
medium |
cost-model |
CVE-2024-35255 |
medium |
prometheus, grafana |
CVE-2024-37370 |
medium |
cost-model |
CVE-2024-37371 |
medium |
cost-model |
CVE-2024-37891 |
medium |
cost-model, k8s-sidecar |
CVE-2024-43790 |
medium |
cost-model |
CVE-2024-4603 |
medium |
k8s-sidecar, grafana |
CVE-2024-4741 |
medium |
k8s-sidecar, grafana |
CVE-2024-5535 |
medium |
k8s-sidecar, grafana |
CVE-2024-6104 |
medium |
prometheus |
CVE-2024-6119 |
medium |
k8s-sidecar, grafana, frontend, kubecost-modeling |
CVE-2024-6923 |
medium |
cost-model |
CVE-2024-8088 |
medium |
cost-model |
GHSA-mh55-gqvf-xfwm |
medium |
grafana |
Low
CVE ID |
Severity |
Affected Images |
CVE-2020-20703 |
low |
cost-model |
CVE-2021-3572 |
low |
cost-model |
CVE-2021-3903 |
low |
cost-model |
CVE-2021-3927 |
low |
cost-model |
CVE-2021-3928 |
low |
cost-model |
CVE-2021-3968 |
low |
cost-model |
CVE-2021-3973 |
low |
cost-model |
CVE-2021-3974 |
low |
cost-model |
CVE-2021-4136 |
low |
cost-model |
CVE-2021-4166 |
low |
cost-model |
CVE-2021-4173 |
low |
cost-model |
CVE-2021-4187 |
low |
cost-model |
CVE-2022-0213 |
low |
cost-model |
CVE-2022-0351 |
low |
cost-model |
CVE-2022-1616 |
low |
cost-model |
CVE-2022-1619 |
low |
cost-model |
CVE-2022-1620 |
low |
cost-model |
CVE-2022-1674 |
low |
cost-model |
CVE-2022-1720 |
low |
cost-model |
CVE-2022-1725 |
low |
cost-model |
CVE-2022-2042 |
low |
cost-model |
CVE-2022-2124 |
low |
cost-model |
CVE-2022-2125 |
low |
cost-model |
CVE-2022-2126 |
low |
cost-model |
CVE-2022-2129 |
low |
cost-model |
CVE-2022-2175 |
low |
cost-model |
CVE-2022-2182 |
low |
cost-model |
CVE-2022-2183 |
low |
cost-model |
CVE-2022-2206 |
low |
cost-model |
CVE-2022-2207 |
low |
cost-model |
CVE-2022-2208 |
low |
cost-model |
CVE-2022-2210 |
low |
cost-model |
CVE-2022-2257 |
low |
cost-model |
CVE-2022-2284 |
low |
cost-model |
CVE-2022-2285 |
low |
cost-model |
CVE-2022-2286 |
low |
cost-model |
CVE-2022-2287 |
low |
cost-model |
CVE-2022-2304 |
low |
cost-model |
CVE-2022-2343 |
low |
cost-model |
CVE-2022-2344 |
low |
cost-model |
CVE-2022-2345 |
low |
cost-model |
CVE-2022-2522 |
low |
cost-model |
CVE-2022-2817 |
low |
cost-model |
CVE-2022-2819 |
low |
cost-model |
CVE-2022-2845 |
low |
cost-model |
CVE-2022-2849 |
low |
cost-model |
CVE-2022-2862 |
low |
cost-model |
CVE-2022-2874 |
low |
cost-model |
CVE-2022-2889 |
low |
cost-model |
CVE-2022-2923 |
low |
cost-model |
CVE-2022-2946 |
low |
cost-model |
CVE-2022-2980 |
low |
cost-model |
CVE-2022-2982 |
low |
cost-model |
CVE-2022-3016 |
low |
cost-model |
CVE-2022-3037 |
low |
cost-model |
CVE-2022-3099 |
low |
cost-model |
CVE-2022-3134 |
low |
cost-model |
CVE-2022-3153 |
low |
cost-model |
CVE-2022-3234 |
low |
cost-model |
CVE-2022-3235 |
low |
cost-model |
CVE-2022-3256 |
low |
cost-model |
CVE-2022-3278 |
low |
cost-model |
CVE-2022-3296 |
low |
cost-model |
CVE-2022-3297 |
low |
cost-model |
CVE-2022-3324 |
low |
cost-model |
CVE-2022-3352 |
low |
cost-model |
CVE-2022-3606 |
low |
cost-model |
CVE-2022-3705 |
low |
cost-model |
CVE-2022-4141 |
low |
cost-model |
CVE-2022-4292 |
low |
cost-model |
CVE-2022-4293 |
low |
cost-model |
CVE-2022-47007 |
low |
cost-model |
CVE-2022-47010 |
low |
cost-model |
CVE-2022-47011 |
low |
cost-model |
CVE-2023-0049 |
low |
cost-model |
CVE-2023-0051 |
low |
cost-model |
CVE-2023-0054 |
low |
cost-model |
CVE-2023-0288 |
low |
cost-model |
CVE-2023-0433 |
low |
cost-model |
CVE-2023-0512 |
low |
cost-model |
CVE-2023-1127 |
low |
cost-model |
CVE-2023-1170 |
low |
cost-model |
CVE-2023-1175 |
low |
cost-model |
CVE-2023-1264 |
low |
cost-model |
CVE-2023-2609 |
low |
cost-model |
CVE-2023-2610 |
low |
cost-model |
CVE-2023-39804 |
low |
cost-model |
CVE-2023-46246 |
low |
cost-model |
CVE-2023-4733 |
low |
cost-model |
CVE-2023-4734 |
low |
cost-model |
CVE-2023-4735 |
low |
cost-model |
CVE-2023-4738 |
low |
cost-model |
CVE-2023-4750 |
low |
cost-model |
CVE-2023-4751 |
low |
cost-model |
CVE-2023-4752 |
low |
cost-model |
CVE-2023-4781 |
low |
cost-model |
CVE-2023-48231 |
low |
cost-model |
CVE-2023-48232 |
low |
cost-model |
CVE-2023-48233 |
low |
cost-model |
CVE-2023-48234 |
low |
cost-model |
CVE-2023-48235 |
low |
cost-model |
CVE-2023-48236 |
low |
cost-model |
CVE-2023-48237 |
low |
cost-model |
CVE-2023-48706 |
low |
cost-model |
CVE-2023-5344 |
low |
cost-model |
CVE-2023-5441 |
low |
cost-model |
CVE-2023-5535 |
low |
cost-model |
CVE-2024-0397 |
low |
cost-model |
CVE-2024-22667 |
low |
cost-model |
CVE-2024-2511 |
low |
grafana |
CVE-2024-25260 |
low |
cost-model |
CVE-2024-39689 |
low |
k8s-sidecar |
CVE-2024-41957 |
low |
cost-model |
CVE-2024-41965 |
low |
cost-model |
CVE-2024-43374 |
low |
cost-model |
CVE-2024-43802 |
low |
cost-model |
CVE-2024-45306 |
low |
cost-model |
CVE-2024-7592 |
low |
cost-model |
GHSA-xr7q-jx4m-x55m |
low |
cost-model, grafana |