Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rfc): new artifacts service #1

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nemesisOsorio
Copy link

No description provided.

@xibz
Copy link

xibz commented Jun 23, 2023

@nemesisOsorio - I actually plan on submitting a PR for this very change here in spinnaker soon. We (Apple) implemented this very thing about a year ago, and haven't gotten around til recently into looking to upstream the changes.

Goal is sometime Monday or Tuesday since prepping this PR has been a little bit of a journey 😆

@nemesisOsorio
Copy link
Author

@xibz that's pretty awesome, this PR was intended to have a pre-discussion before it goes to OSS, let me know if you have any thoughts, and I will discuss with Armory to decide if we want to proceed with this PR.
Thank you.

Users with large context pipelines do not have a good experience with pipeline executions. <br/>
Context and outputs in pipeline executions can contain large artifacts/objects such as k8s manifest (yaml/json files). <br/>
Spinnaker passes these artifacts/objects in request and response bodies across some services. <br/>
The idea is to create a new service to store these artifacts/objects

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a service initially? OR just mask this in the APIs returning pipeline data to gate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we create a new service, we can reuse the produced artifact across different services.
For example: Bake and Deploy a Kustomize template:

  1. Rosco produces and saves the artifact and returns the id to Orca
  2. Orca sends the deploy operation to Clouddriver using the artifact id
  3. Clouddriver gets the artifact by id

I believe your second question is covered in the Alternatives section.

and only pass the `id`s of those artifacts/objects in the request/response bodies and context/outputs.

Deck users do not click to see the produced artifact/object in every single pipeline execution.
With this approach, artifacts will be loaded on demand (lazy loading).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought, should this be "selectable" by artifact type? I think not, but had to toss this out there - is there a state where we NEED artifact data LIKE Git information for the ui to know how it's triggered.

Also would this impact "Trigger" artifacts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need a filter by artifact type.
For trigger artifacts, we've added a question to the Known Unknowns section, I'll keep digging in the code and hopefully add an answer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an answer for trigger artifacts

The proposed plan is to implement this by pipeline stages:

* Milestone 1: (Targeting 1.32) Address stages with the most acute performance issues
* Introduce new Artifact service

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big one not answered here is the installation of a new service, config and how that'd be done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to avoid changes in Halyard, do you think raw yml manifests might be the way?

We will definitely add a way to deploy using Operator.

varchar name "artifact name"
varchar type "base64, json"
json content "the large artifact"
timestamp created_at "creation date time"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need permissions as well please. I'd like to NOT have any system that can return data - and artifacts are definitely sensitive - going forward. Ideally permissions should be localized and NOT need to query Fiat.


Add a scheduler to automatically delete artifacts. The following options will be configurable:
* Retention time: How long the record will be kept in the database.
* Artifact cleanup job: Configurable cron job that deletes artifacts periodically in accordance to the retention time property.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to keep cleanup in mind thank you!

Make changes in Deck components that are using the context/output to render artifacts/objects,
and only load those artifacts/objects if you click the button/link(lazy loading).

Enable Fiat autoconfiguration and use application permissions for new artifacts endpoints.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible on this - but ideally we're depending less upon Fiat to define permissions. Instead "artifact accounts" maybe with permissions as a new type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added ACLs in Future Possibilities

@xibz
Copy link

xibz commented Jun 26, 2023

@nemesisOsorio How this came about was David and Apple had a meeting regarding context size, and we explained how we solved this and once the solution was heard we decided to upstream the changes. So, I can go ahead create the PR, and you guys can duke it out with David lol, but if we want to make improvements to the PR to fit this RFC, we can always do that as well.

It doesn't use a separate service. It introduces a concept called ArtifactStore, which is just artifact storage and utilizes custom serializers and deserializers in clouddriver and rosco to expand and compress the artifacts. We originally tried it with Jackson's converters, but there's a known bug which causes it to stackoverflow, lul. orca does use the artifact store directly, however. We have an ArtifactStore specifically for S3, so we just use that.

Also, it uses application as security, and it does query fiat, due to manual judgement stages potentially allowing for some rando to click okay, and thus needs to figure out the permissions at that point. We could definitely expand on that to make it even more secure, but it should be fine for now at least.

We also introduced a new API to clouddriver that allows users to fetch the artifact, eg in deck when viewing the yaml.

Further this is feature flagged, so you can turn it on and enable it for certain applications as well

This doesn't change anything from past behavior, even SpEL works correctly :) so if you have users using #fromBase64 things will still work appropriately.

@nemesisOsorio
Copy link
Author

@xibz we want to send the PR to OSS, do you mind if we put your comments in the Alternatives section?
Another question: Can we add Apple to the list of Early customers?

@xibz
Copy link

xibz commented Jun 29, 2023

@nemesisOsorio I think there's a couple things here.

Another question: Can we add Apple to the list of Early customers?

This doesn't make too much sense. Are we customers to our own implementation? So I would opt to have it worded with something of Apple uses spinnaker/kork#1069 or something.

we want to send the PR to OSS, do you mind if we put your comments in the Alternatives section?

I think we are jumping the gun a little bit. The PR that I have is a lot smaller in scope for this particular change. I think we should instead have the RFC be a two step process.

  1. Does PR alleviate all that Spinnaker users' need in reducing context size?
    a. We can have a sub-RFC for the artifact storage PR if needed, but I think we should first see if the PR alleviates the size issues for you guys
  2. If not, then we go with a full fledge artifact service.

The last thing I think we should be thinking about doing is introducing a new service without trying simpler approaches first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants