Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3026: Add support for s3 data importer inheriting service account credentials #3433

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions api/openapi-spec/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -4891,6 +4891,10 @@
"description": "SecretRef provides the secret reference needed to access the S3 source",
"type": "string"
},
"serviceAccountName": {
"description": "ServiceAccountName provides the SAN needed if we want to use chain creds for S3 access (optional, if SecretRef supplied)",
"type": "string"
},
"url": {
"description": "URL is the url of the S3 source",
"type": "string",
Expand Down
24 changes: 18 additions & 6 deletions cmd/cdi-importer/importer.go
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ func newDataSource(source string, contentType string, volumeMode v1.PersistentVo
ep, _ := util.ParseEnvVar(common.ImporterEndpoint, false)
acc, _ := util.ParseEnvVar(common.ImporterAccessKeyID, false)
sec, _ := util.ParseEnvVar(common.ImporterSecretKey, false)
serviceAccount, _ := util.ParseEnvVar(common.ImporterServiceAccountName, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah rats! thanks for closing this gap in my understanding -- let me take a stab at that right now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so fingers crossed that I understood how this flows in 04d29c3

I dont need to add anything here, right?
https://github.com/russellcain/containerized-data-importer/blob/41ba02d03ce0205b38fdfc9cbae4a1dc17b3abff/tools/cdi-source-update-poller/main.go#L48
looks like more so this parses env vars (which the SA doesnt fall into) and args passed at invocation (also not applicable)?

should i be adding a check about the source here? and then set it to any value if the PodSpec has a SA set?

keyf, _ := util.ParseEnvVar(common.ImporterGoogleCredentialFileVar, false)
diskID, _ := util.ParseEnvVar(common.ImporterDiskID, false)
uuid, _ := util.ParseEnvVar(common.ImporterUUID, false)
Expand All @@ -271,34 +272,45 @@ func newDataSource(source string, contentType string, volumeMode v1.PersistentVo
case cc.SourceHTTP:
ds, err := importer.NewHTTPDataSource(getHTTPEp(ep), acc, sec, certDir, cdiv1.DataVolumeContentType(contentType))
if err != nil {
errorCannotConnectDataSource(err, "http")
errorCannotConnectDataSource(err, cc.SourceHTTP)
}
return ds
case cc.SourceImageio:
ds, err := importer.NewImageioDataSource(ep, acc, sec, certDir, diskID, currentCheckpoint, previousCheckpoint)
if err != nil {
errorCannotConnectDataSource(err, "imageio")
errorCannotConnectDataSource(err, cc.SourceImageio)
}
return ds
case cc.SourceRegistry:
ds := importer.NewRegistryDataSource(ep, acc, sec, certDir, insecureTLS)
return ds
case cc.SourceS3:
ds, err := importer.NewS3DataSource(ep, acc, sec, certDir)
var (
ds *importer.S3DataSource
err error
)
if serviceAccount != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value of serviceAccount ever used in the aws API? Or does the aws lib assume that the pod is running as the privileged service account?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter! hopefully i was too far off base, but the approach was that this would be used as a "flag" given the service account should have been properly set up by the time it is invoked here? (or fail on referencing an un-instantiated SA)

// use this as a flag to say the user has a SAN set up with creds that IRSA will read
klog.Infof("Attempting to create your S3 Data Source with cloud provider creds.\n")
ds, err = importer.NewChainCredentialsS3DataSource(ep, certDir)
} else {
// default behaviour of using supplied access key and secret key to configure S3 client
ds, err = importer.NewS3DataSource(ep, acc, sec, certDir)
}
if err != nil {
errorCannotConnectDataSource(err, "s3")
errorCannotConnectDataSource(err, cc.SourceS3)
}
return ds
case cc.SourceGCS:
ds, err := importer.NewGCSDataSource(ep, keyf)
if err != nil {
errorCannotConnectDataSource(err, "gcs")
errorCannotConnectDataSource(err, cc.SourceGCS)
}
return ds
case cc.SourceVDDK:
ds, err := importer.NewVDDKDataSource(ep, acc, sec, thumbprint, uuid, backingFile, currentCheckpoint, previousCheckpoint, finalCheckpoint, volumeMode)
if err != nil {
errorCannotConnectDataSource(err, "vddk")
errorCannotConnectDataSource(err, cc.SourceVDDK)
}
return ds
default:
Expand Down
7 changes: 7 additions & 0 deletions pkg/apis/core/v1beta1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions pkg/common/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@ const (
ImporterAccessKeyID = "IMPORTER_ACCESS_KEY_ID"
// ImporterSecretKey provides a constant to capture our env variable "IMPORTER_SECRET_KEY"
ImporterSecretKey = "IMPORTER_SECRET_KEY"
// ImporterSecretKey provides a constant to capture our env variable "IMPORTER_SECRET_KEY"
ImporterServiceAccountName = "IMPORTER_SERVICE_ACCOUNT_NAME"
// ImporterImageSize provides a constant to capture our env variable "IMPORTER_IMAGE_SIZE"
ImporterImageSize = "IMPORTER_IMAGE_SIZE"
// ImporterCertDirVar provides a constant to capture our env variable "IMPORTER_CERT_DIR"
Expand Down
5 changes: 5 additions & 0 deletions pkg/controller/import-controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ type importPodEnvVar struct {
extraHeaders []string
secretExtraHeaders []string
cacheMode string
serviceAccountName string
}

type importerPodArgs struct {
Expand Down Expand Up @@ -1263,6 +1264,10 @@ func makeImportEnv(podEnvVar *importPodEnvVar, uid types.UID) []corev1.EnvVar {
Name: common.CacheMode,
Value: podEnvVar.cacheMode,
},
{
Name: common.ImporterServiceAccountName,
Value: podEnvVar.serviceAccountName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things

  1. if the worker pod does not need to know the SA name and this param is just a flag for setting sessionConfig.CredentialsChainVerboseErrors = aws.Bool(true) then why not make this a boolean value with key something like S3_CREDENTIALS_CHAIN_AUTH

  2. You have to set the service account name in the importer pod spec somewhere in here: https://github.com/russellcain/containerized-data-importer/blob/3026/IRSA-integration-for-s3-service-accounts/pkg/controller/import-controller.go#L884-L949

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi -- thanks for explaining this, i think i am almost on the same page so appreciate the patience on your end.

  • for 1, does this handle my concern posted here? I agree that, to the best of my knowledge, the SA name itself will not be meaningfully used by the worker pod and instead is just a flag to not set sessionConfig.Credentials. I guess i was hoping to make a smaller footprint / mimic other patterns where referencing/supplying a SA name in the pod spec communicates that the values of that account are consumed by the pod (since that is happening in this case, just not through an explicit invocation). If I'm talking myself into circles, please let me know, but does this make sense?

    • I also see the merit in an explicit boolean flag -- if what im explaining above comes across as anything hand-wavy, then I'd be alright with someone knowing what they are signing up for with the S3_CREDENTIALS_CHAIN_AUTH value being set. I guess I'd just be a bit worried that the hypothetical user would set this value without having set a SA? and the approach of specifying the SA name slightly safeguards against that? I might be over-worrying here.
  • for 2, happy to make this change, but wanted to confirm my understanding. are you saying we need the value set at the top level spec definition for how I'm attempting to consume it later on in cmd/cdi-importer/importer.go? that makes sense to me, but would still want to make sure i got eyes on https://github.com/kubevirt/containerized-data-importer/pull/3433/files#r1767598831

thanks again for taking your time to help me with this, @mhenriks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec: corev1.PodSpec{
			Containers:        makeImporterContainerSpec(args),
			InitContainers:    makeImporterInitContainersSpec(args),
			Volumes:           makeImporterVolumeSpec(args),
			RestartPolicy:     corev1.RestartPolicyOnFailure,
			NodeSelector:      args.workloadNodePlacement.NodeSelector,
			Tolerations:       args.workloadNodePlacement.Tolerations,
			Affinity:          args.workloadNodePlacement.Affinity,
			PriorityClassName: args.priorityClassName,
			ImagePullSecrets:  args.imagePullSecrets,
-------------> ServiceAccountName: args.serviceAccountName
		},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some documentation here explaining what happens when the serviceAccountName is added to the pod, what the pod-identity-webhook is mutating in the pod spec

},
}
if podEnvVar.secretName != "" && podEnvVar.source != cc.SourceGCS {
env = append(env, corev1.EnvVar{
Expand Down
7 changes: 6 additions & 1 deletion pkg/controller/import-controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -962,7 +962,8 @@ var _ = Describe("Import test env", func() {
currentCheckpoint: "",
previousCheckpoint: "",
finalCheckpoint: "",
preallocation: false}
preallocation: false,
serviceAccountName: ""}
Expect(reflect.DeepEqual(makeImportEnv(testEnvVar, mockUID), createImportTestEnv(testEnvVar, mockUID))).To(BeTrue())
})
})
Expand Down Expand Up @@ -1245,6 +1246,10 @@ func createImportTestEnv(podEnvVar *importPodEnvVar, uid string) []corev1.EnvVar
Name: common.CacheMode,
Value: podEnvVar.cacheMode,
},
{
Name: common.ImporterServiceAccountName,
Value: podEnvVar.serviceAccountName,
},
}

if podEnvVar.secretName != "" {
Expand Down
48 changes: 37 additions & 11 deletions pkg/importer/s3-datasource.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ import (
)

const (
s3FolderSep = "/"
httpScheme = "http"
s3FolderSep = "/"
httpScheme = "http"
emptyAccessKey = ""
emptySecretKey = ""
)

// S3Client is the interface to the used S3 client.
Expand Down Expand Up @@ -53,7 +55,7 @@ type S3DataSource struct {
}

// NewS3DataSource creates a new instance of the S3DataSource
func NewS3DataSource(endpoint, accessKey, secKey string, certDir string) (*S3DataSource, error) {
func NewS3DataSource(endpoint string, accessKey string, secKey string, certDir string) (*S3DataSource, error) {
ep, err := ParseEndpoint(endpoint)
if err != nil {
return nil, errors.Wrapf(err, fmt.Sprintf("unable to parse endpoint %q", endpoint))
Expand All @@ -70,6 +72,23 @@ func NewS3DataSource(endpoint, accessKey, secKey string, certDir string) (*S3Dat
}, nil
}

// NewChainCredentialsS3DataSource creates a new instance of the S3DataSource using chain credentials (wraps NewS3DataSource)
func NewChainCredentialsS3DataSource(endpoint, certDir string) (*S3DataSource, error) {
/*
Quick Note on IRSA credential chain:
When you initialize a new service client without providing any credential arguments, the SDK uses the default credential provider chain to find AWS credentials. The SDK uses the first provider in the chain that returns credentials without an error. The default provider chain looks for credentials in the following order:

- Environment variables. (* set when a `serviceAccountName` is supplied)

- Shared credentials file.

- If your application uses an ECS task definition or RunTask API operation, IAM role for tasks.

- If your application is running on an Amazon EC2 instance, IAM role for Amazon EC2.
*/
return NewS3DataSource(endpoint, emptyAccessKey, emptySecretKey, certDir)
}

// Info is called to get initial information about the data.
func (sd *S3DataSource) Info() (ProcessingPhase, error) {
var err error
Expand Down Expand Up @@ -140,7 +159,7 @@ func (sd *S3DataSource) Close() error {
return err
}

func createS3Reader(ep *url.URL, accessKey, secKey string, certDir string) (io.ReadCloser, error) {
func createS3Reader(ep *url.URL, accessKey string, secKey string, certDir string) (io.ReadCloser, error) {
klog.V(3).Infoln("Using S3 client to get data")

endpoint := ep.Host
Expand Down Expand Up @@ -168,31 +187,38 @@ func createS3Reader(ep *url.URL, accessKey, secKey string, certDir string) (io.R
return objectReader, nil
}

func getS3Client(endpoint, accessKey, secKey string, certDir string, urlScheme string) (S3Client, error) {
func getS3Client(endpoint string, accessKey string, secKey string, certDir string, urlScheme string) (S3Client, error) {
// Adding certs using CustomCABundle will overwrite the SystemCerts, so we opt by creating a custom HTTPClient
httpClient, err := createHTTPClient(certDir)

if err != nil {
return nil, errors.Wrap(err, "Error creating http client for s3")
}
var creds *credentials.Credentials
if accessKey != emptyAccessKey && secKey != emptySecretKey {
creds = credentials.NewStaticCredentials(accessKey, secKey, "")
}

creds := credentials.NewStaticCredentials(accessKey, secKey, "")
region := extractRegion(endpoint)
disableSSL := false
// Disable SSL for http endpoint. This should cause the s3 client to create http requests.
if urlScheme == httpScheme {
disableSSL = true
}

sess, err := session.NewSession(&aws.Config{
sessionConfig := &aws.Config{
Region: aws.String(region),
Endpoint: aws.String(endpoint),
Credentials: creds,
S3ForcePathStyle: aws.Bool(true),
HTTPClient: httpClient,
DisableSSL: &disableSSL,
},
)
}
if creds != nil {
sessionConfig.Credentials = creds
} else {
// recommended value to set when relying on credential chains
sessionConfig.CredentialsChainVerboseErrors = aws.Bool(true)
}
sess, err := session.NewSession(sessionConfig)
if err != nil {
return nil, err
}
Expand Down
15 changes: 15 additions & 0 deletions pkg/operator/resources/crds_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,9 @@ type DataVolumeSourceS3 struct {
URL string `json:"url"`
//SecretRef provides the secret reference needed to access the S3 source
SecretRef string `json:"secretRef,omitempty"`
//ServiceAccountName provides the SAN needed if we want to use chain creds for S3 access (optional, if SecretRef supplied)
// +optional
ServiceAccountName string `json:"serviceAccountName,omitempty"`
// CertConfigMap is a configmap reference, containing a Certificate Authority(CA) public key, and a base64 encoded pem certificate
// +optional
CertConfigMap string `json:"certConfigMap,omitempty"`
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.