[Spike] [max 6h] Decide on cost allocation strategy - Athena vs. Cost Explorer API #4648

consideRatio · 2024-08-21T14:27:39Z

This task is blocking tasks towards attributing costs using Athena, as Yuvi learned about another approach to be evaluated first. This is described in #4453 (comment):

Regardless, I think it's early enough that we should investigate this alternative to Athena.

It would involve:

https://docs.aws.amazon.com/cost-management/latest/userguide/ce-api.html as the source of data.

An intermediate python web server, that talks to the Cost Explorer API

https://grafana.com/grafana/plugins/yesoreyeram-infinity-datasource/ for connecting this from Grafana. This is recommended by grafana as the replacement for https://github.com/grafana/grafana-json-datasource

There are a few major advantages over using Athena:

Much easier to validate, as we aren't writing complex SQL queries but translating what we can visually do in the cost explorer into API calls.

Athena is not per AWS account but at the AWS organization level, so we would have needed an intermediate layer anyway for cases when we use the 2i2c AWS organization. We wouldn't have needed this for Openscapes, but trying to use it for any of our other AWS accounts would've required an intermediate python layer for access control (so different communities can't see ach other's data).

So if possible, we should prefer this method.

We can resuse all the work we had done, except for some parts of #4546.

Next step here is to design a spike to validate this (instead of #4544). The athena specific issues that are subtasks of this can be closed if we are going to take this approach.

Practical spike steps

I think this has to be updated continuously as part of the spike, but the goal is to clarify and verify that its reasonable to move towards using the Cost Explorer API.

Definition of done

A decision is made with motivation on either:
- a) moving onwards with a Cost Explore API approach
- b) moving onwards with an Athena approach
- c) followup in some other way

Potential followup work not part of spike

If we go for Cost Explorer API, work to define/refine further tasks to be worked is needed.

The text was updated successfully, but these errors were encountered:

yuvipanda · 2024-08-22T12:21:40Z

The definition of done looks good to me, @consideRatio.

If we go for Cost Explorer API, work to define/refine further tasks to be worked is needed.

If this isn't part of the spike, once the spike is done can you create another issue to track this? Thanks.

consideRatio · 2024-08-22T13:38:13Z

Picking it up now with some initial reading at the end of my day, to be continued tomorrow.

consideRatio · 2024-08-23T08:29:29Z

Notes to sketch a future possible implementation

I think defining grafana dashboards with panels and queries can be done in isolation as long as we have a dummy JSON blob of data to work against, we can then tweak the dummy JSON blob to become live.

consideRatio · 2024-08-23T09:48:36Z

Conclusion - moving forward with Cost Explorer API

I've arrived at what I consider sufficient grounds for a Decision to move ahead with Cost Explorer API.

It seems technicallt very viable, and the mhe motivation by Yuvi for using Cost Explorer API over Athena is sufficient in my mind.

There are a few major advantages over using Athena:

Much easier to validate, as we aren't writing complex SQL queries but translating what we can visually do in the cost explorer into API calls.

Athena is not per AWS account but at the AWS organization level, so we would have needed an intermediate layer anyway for cases when we use the 2i2c AWS organization. We wouldn't have needed this for Openscapes, but trying to use it for any of our other AWS accounts would've required an intermediate python layer for access control (so different communities can't see ach other's data).

Another positive conclusion is that it seems that we can avoid needing much complexity within the Python intermediary, and can put that complexity in the Grafana queries instead. This is because the infinity plugins seem to allow for notable post-processing of the JSON responses. Due to this, we can probably more responsively and quickly iterate on the cost dashboards and improve them, letting the Python intermediary be a quite slimmed project with relatively low complexity, making it more viable for re-use by others as well.

yuvipanda · 2024-08-24T03:04:06Z

Another positive conclusion is that it seems that we can avoid needing much complexity within the Python intermediary, and can put that complexity in the Grafana queries instead.

Given that we'll be working on https://2i2c.productboard.com/roadmap/7803626-product-delivery-flow/features/27195081 in the future, as well as possibly needing to extend this work onto GCP, and the recommendations in https://docs.aws.amazon.com/cost-management/latest/userguide/ce-api-best-practices.html#ce-api-best-practices-optimize-costs, I'd like most of the complexity to actually be in the python layer, and not in the grafana layer. Fixing issues in Python code is also far more accessible to more team members and other open source contributors than fixing it in jsonnet + the filtering languages that the grafana plugin uses. So let's use the grafana plugin as primarily a visual display layer, and keep most of the complexity in the python code.

consideRatio mentioned this issue Aug 21, 2024

[EPIC] Support attributing costs to individual hubs automatically on Openscapes #4453

Closed

2 tasks

consideRatio changed the title ~~[Spike] PLACEHOLDER - decide on athena path or another strategy~~ [Spike] Decide on cost allocation strategy - Athena or new strategy Aug 21, 2024

consideRatio changed the title ~~[Spike] Decide on cost allocation strategy - Athena or new strategy~~ [Spike] [max 8h] Decide on cost allocation strategy - Athena or new strategy Aug 21, 2024

consideRatio changed the title ~~[Spike] [max 8h] Decide on cost allocation strategy - Athena or new strategy~~ [Spike] [max 4h] Decide on cost allocation strategy - Athena or new strategy Aug 21, 2024

consideRatio changed the title ~~[Spike] [max 4h] Decide on cost allocation strategy - Athena or new strategy~~ [Spike] [max 6h] Decide on cost allocation strategy - Athena or new strategy Aug 21, 2024

consideRatio changed the title ~~[Spike] [max 6h] Decide on cost allocation strategy - Athena or new strategy~~ [Spike] [max 6h] Decide on cost allocation strategy - Athena vs. Cost Explorer API Aug 22, 2024

consideRatio self-assigned this Aug 22, 2024

consideRatio mentioned this issue Aug 22, 2024

[Spike: 1hr] Investigate why nodeSelectors are not being respected for dask #4600

Closed

consideRatio closed this as completed Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] [max 6h] Decide on cost allocation strategy - Athena vs. Cost Explorer API #4648

[Spike] [max 6h] Decide on cost allocation strategy - Athena vs. Cost Explorer API #4648

consideRatio commented Aug 21, 2024 •

edited

Loading

yuvipanda commented Aug 22, 2024

consideRatio commented Aug 22, 2024

consideRatio commented Aug 23, 2024

consideRatio commented Aug 23, 2024 •

edited

Loading

yuvipanda commented Aug 24, 2024

[Spike] [max 6h] Decide on cost allocation strategy - Athena vs. Cost Explorer API #4648

[Spike] [max 6h] Decide on cost allocation strategy - Athena vs. Cost Explorer API #4648

Comments

consideRatio commented Aug 21, 2024 • edited Loading

Practical spike steps

Definition of done

Potential followup work not part of spike

yuvipanda commented Aug 22, 2024

consideRatio commented Aug 22, 2024

consideRatio commented Aug 23, 2024

Notes to sketch a future possible implementation

consideRatio commented Aug 23, 2024 • edited Loading

Conclusion - moving forward with Cost Explorer API

yuvipanda commented Aug 24, 2024

consideRatio commented Aug 21, 2024 •

edited

Loading

consideRatio commented Aug 23, 2024 •

edited

Loading