Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Bitnami embedded SBOMs #3065

Open
willmurphyscode opened this issue Jul 24, 2024 · 26 comments · May be fixed by #3341
Open

Support Bitnami embedded SBOMs #3065

willmurphyscode opened this issue Jul 24, 2024 · 26 comments · May be fixed by #3341
Assignees
Labels

Comments

@willmurphyscode
Copy link
Contributor

What would you like to be added:

As part of anchore/grype#1609, Syft should pick up on sboms in containers located at /opt/bitnami because this is how Bitnami records what's in an image.

The SBOM cataloger would probably do this already, but is off by default.

There are a few open questions here:

  1. How should packages discovered by other catalogers interact with these SBOMs? For example, the binary cataloger might find Redis or MariaDB executables.
  2. What if someone is building something FROM a Bitnami image? How do we know we can trust the SBOM?
  3. If we are special-casing Bitnami images, e.g. turning the SBOM cataloger on by default only for certain images or certain paths, how do we detect this situation and what configuration options are available?

Why is this needed:

This is primarily needed so that running grype on a Bitnami image (see anchore/grype#1609) is as accurate as possible.

Additional context:

There are a few open requests for more accurate Bitnami classification. Ideally this work might also fix those.

@kzantow
Copy link
Contributor

kzantow commented Jul 31, 2024

Is there another way to scan these artifacts? Are these container images in some differing format from OCI? If the only way to identify what is installed is by scanning an SBOM, there could probably just be a Bitnami cataloger that looks for specific SBOMs in these known bitnami locations, instead of enabling the SBOM cataloger itself. It's pretty easy to just pass a reader to the SBOM decoder. And then we'd probably want to have a way to prevent SBOMs from getting scanned twice if a user does enable the SBOM cataloger.

@willmurphyscode
Copy link
Contributor Author

willmurphyscode commented Aug 1, 2024

Two questions for investigation:

  1. If we add a bitnami cataloger, and turn both it and the SBOM cataloger on, do we get duplicates?
  2. Do we and should we surface all information from the bitnami SPDX in the Syft output SBOM? It might be that the interface for a cataloger is too specific; it only returns packages and relationships. SPDX can express more than this.

The easy path to implement this is essentially a copy of the SBOM cataloger with a much narrower file glob, assuming it doesn't cause duplicates or miss critical information.

@spiffcs spiffcs moved this to Ready in OSS Aug 8, 2024
@willmurphyscode
Copy link
Contributor Author

  1. If we add a bitnami cataloger, and turn both it and the SBOM cataloger on, do we get duplicates?

I did an experiment to answer this.

  1. Copy the SBOM cataloger to make a new bitnami cataloger, but change the glob list to be only "/opt/bitnami/**/*.spdx"
  2. Wire the new cataloger up here: https://github.com/anchore/syft/blob/main/internal/task/package_tasks.go#L151
  3. Run syft with sbom and bitnami on, with each on, and with neither on, and look at the packages returned:
❯ go run ./cmd/syft -q --select-catalogers "-sbom-cataloger,+bitnami-cataloger" bitnami/moodle:4.4 -o json |\
 jq -r '.artifacts[] | select(.foundBy == "bitnami-cataloger" or .foundBy == "sbom-cataloger") | .name' |\
 shasum
b07dd9b416f25edca5e143218ac6474360980fce  -

❯ go run ./cmd/syft -q --select-catalogers "+sbom-cataloger,+bitnami-cataloger" bitnami/moodle:4.4 -o json |\
 jq -r '.artifacts[] | select(.foundBy == "bitnami-cataloger" or .foundBy == "sbom-cataloger") | .name' |\
 shasum
b07dd9b416f25edca5e143218ac6474360980fce  -

❯ go run ./cmd/syft -q --select-catalogers "+sbom-cataloger,-bitnami-cataloger" bitnami/moodle:4.4 -o json |\
 jq -r '.artifacts[] | select(.foundBy == "bitnami-cataloger" or .foundBy == "sbom-cataloger") | .name' |\
 shasum
b07dd9b416f25edca5e143218ac6474360980fce  -

So I think the answer to question 1 is, "at least as it stands right now, Syft's existing deduplication logic works fine if both catalogers are on." Of course, in this experiment the catalogers are identical, but it's still a good sign on question 1 above.

@willmurphyscode
Copy link
Contributor Author

I've attached the SBOM syft makes in my experiment:

go run ./cmd/syft -q --override-default-catalogers "bitnami-cataloger" bitnami/moodle:4.4 -o spdx >/tmp/from-syft-bitnami.spdx.txt

from-syft-bitnami.spdx.txt

@juan131
Copy link

juan131 commented Oct 15, 2024

@willmurphyscode I started working on this and I realized that packages detected by a new "Bitnami" cataloger are given the type UnknownPackage and, after reading the developing guide I wonder whether it makes sense to create a new "Bitnami" package type with the captured data from Bitnami. For instance, the revision information we include in Bitnami versions, see:

I guess this could complicate how to manage duplicates reported by both sbom and bitnami catalogers but I guess we could use the PURL for that.

@willmurphyscode
Copy link
Contributor Author

Hi @juan131 (cc @wagoodman),

Some thoughts here:

  1. Are the packages represented in the bitnami SBOMs from different ecosystems? For example, is it a Go binary or a Python package or something? I think they are usually just native binaries, like Redis or MySQL executable files, but I'm not sure. If so, it might make sense to keep them in those packages. If the package is an existing package type, e.g. Go or Binary, it might make sense to put it there.
  2. Are they binary packages? For example, if you have a compiled MySQL server executable, that sounds like a binary package. (Also, if the binary classifier and bitnami cataloger both find it, we should de-dupe in favor of bitnami).
  3. We could put a query param in the PURL, like bitnami=true or source=bitnami or something, to inform grype matching. Also, repository_url=bitnami.com or something is within the PURL spec, and could tell Grype to search bitnami vulns for this package.
  4. We're reluctant to introduce a new package type, because bitnami is really a vendor of the package, not kind of package.

In short:

  1. The PURL should say that the package is from bitnami somehow so that Grype and other tools can use your vulnerability feed.
  2. We should not make a new package type, because bitnami packages are from a specific vendor, not a specific kind of package, so this information should go somewhere in the PURL besides the type, e.g. a query parameter.
  3. We will require a Grype change here, and it probably makes sense to pull in https://github.com/bitnami/go-version for the Grype version comparison

@westonsteimel
Copy link
Contributor

westonsteimel commented Oct 15, 2024

bitnami is a purl package type though: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#bitnami

@juan131
Copy link

juan131 commented Oct 15, 2024

Are the packages represented in the bitnami SBOMs from different ecosystems?

Yes. In the same SPDX file packages from different ecosystems can coexist. Take the examples below (taken from bitnami/kubectl image) of packages listed in the kubectl SBOM:

  • We have "bitnami" packages such as this one:
        {
            "SPDXID": "SPDXRef-kubectl",
            "name": "kubectl",
            "versionInfo": "1.31.1-1",
            "downloadLocation": "git+https://github.com/kubernetes/kubernetes#refs/tags/v1.31.1",
            "licenseConcluded": "Apache-2.0",
            "licenseDeclared": "Apache-2.0",
            "filesAnalyzed": false,
            "externalRefs": [
                {
                    "referenceCategory": "SECURITY",
                    "referenceType": "cpe23Type",
                    "referenceLocator": "cpe:2.3:*:kubectl:kubectl:1.31.1:*:*:*:*:*:*:*"
                },
                {
                    "referenceCategory": "PACKAGE-MANAGER",
                    "referenceType": "purl",
                    "referenceLocator": "pkg:bitnami/[email protected]?arch=arm64&distro=debian-12"
                }
            ],
            "copyrightText": "NOASSERTION"
        }
  • And also "golang" packages such as this one:
        {
            "name": "github.com/MakeNowJust/heredoc",
            "SPDXID": "SPDXRef-Package-808f8a3a08f58be6",
            "versionInfo": "v1.0.0",
            "supplier": "NOASSERTION",
            "downloadLocation": "NONE",
            "filesAnalyzed": false,
            "sourceInfo": "opt/bitnami/kubectl/bin/kubectl",
            "licenseConcluded": "NONE",
            "licenseDeclared": "NONE",
            "externalRefs": [
                {
                    "referenceCategory": "PACKAGE-MANAGER",
                    "referenceType": "purl",
                    "referenceLocator": "pkg:golang/github.com/makenowjust/[email protected]"
                }
            ],
            "primaryPackagePurpose": "LIBRARY",
            "copyrightText": "NOASSERTION"
        }

Then, relationships links them:

    "relationships": [
        {
            "spdxElementId": "SPDXRef-kubectl",
            "relationshipType": "CONTAINS",
            "relatedSpdxElement": "SPDXRef-Application-b66f42f85c68bc03-kubectl"
        },
        {
            "spdxElementId": "SPDXRef-Application-b66f42f85c68bc03-kubectl",
            "relatedSpdxElement": "SPDXRef-Package-808f8a3a08f58be6",
            "relationshipType": "DEPENDS_ON"
        }

Are they binary packages?

Yes, Bitnami packages can be simply compiled binaries (based on Golang, C, C++, etc.) but they can be also apps written in interpreted languages (e.g. PHP or Node.JS apps)

We could put a query param in the PURL

I don't think that's necessary. As @westonsteimel mentioned, they're recognized as a valid PURL package type.

@willmurphyscode
Copy link
Contributor Author

Hi @juan131!

Thanks @westonsteimel - I did not realize bitnami was an official PURL package type - I thought we would be inventing the package type for the sake of this cataloger.

It looks like there are already PURLs with package types in the bitnami SPDX? I propose we do the following:

  1. Add bitnami as a Syft package type
  2. In the cataloger, emit a package type based on the PURL type we find (so in the example above, emit a bitnami package for kubectly and a golang package for heredoc)
  3. Add a bitnami repository URL to the PURLs, so that the ones that are from bitnami but not pkg:bitnami are still labeled as being from Bitnami.

@westonsteimel and @wagoodman do you all agree?

@juan131
Copy link

juan131 commented Oct 16, 2024

I think that makes sense @willmurphyscode !! Regarding the 3rd point, when you talk about packages from Bitnami but not pkg:bitnami, what packages are you referring to?

@juan131
Copy link

juan131 commented Oct 16, 2024

By the way, I added support for the Bitnami pURL type at anchore/packageurl-go#22

@willmurphyscode
Copy link
Contributor Author

when you talk about packages from Bitnami but not pkg:bitnami, what packages are you referring to?

I thought you told us that there were packages in bitnami SPDX files that are have a different purl type:

And also "golang" packages such as this one:
...
pkg:golang/github.com/makenowjust/[email protected]

from the second example in #3065 (comment).

So what I was trying to talk about was: Packages that are declared in a Bitnami SPDX manifest and are therefore found by the new bitnami cataloger but, because the bitnami SDPX declares them with a different PURL type, they do not have package type bitnami. Heredoc in your post above is such a package.

@juan131 does that make sense?

@juan131
Copy link

juan131 commented Oct 16, 2024

I see your point @willmurphyscode

Following the same example about the golang package included in the Bitnami SBOM. I guess the same package will be reported twice:

  • Once by Bitnami cataloger (analyzing the SPDX file).
  • Once by Golang cataloger (analyzing go.mod).

I guess the ideal scenario is to have a mechanism to detect both packages are actually the same one (e.g. by comparing their pURL or similar). With this in mind, are we adding value by labeling these packages as "being from Bitnami"?

@willmurphyscode
Copy link
Contributor Author

With this in mind, are we adding value by labeling these packages as "being from Bitnami"?

Do we want Grype to be able to match these against the bitnami vulnerability data? In other words, does the bitnami vulnerability data cover these packages? If we just raise it up as a regular Go package, Grype will never know to compare it to the bitnami vulnerability data, but I don't know the scope of that data, so I don't know whether that's what we want.

In the example SPDX SBOM above, would you expect a vulnerability scanner to look in Bitnami's database for CVEs agains the heredoc golang package?

@juan131 juan131 linked a pull request Oct 16, 2024 that will close this issue
9 tasks
@juan131
Copy link

juan131 commented Oct 17, 2024

@willmurphyscode the Bitnami Vulnerability Database only has info about Bitnami packages

For instance, render-template is a component we include on several Bitnami images. If we inspect its SPDX file...

{
    "SPDXID": "SPDXRef-render-template",
    (...)
    "packages": [
        {
            "SPDXID": "SPDXRef-render-template",
            "name": "render-template",
            "versionInfo": "1.0.7-4",
            "downloadLocation": "https://github.com/bitnami/render-template/archive/refs/tags/v1.0.7.tar.gz",
            "licenseConcluded": "Apache-2.0",
            "licenseDeclared": "Apache-2.0",
            "filesAnalyzed": false,
            "externalRefs": [
                {
                    "referenceCategory": "SECURITY",
                    "referenceType": "cpe23Type",
                    "referenceLocator": "cpe:2.3:*:render-template:render-template:1.0.7:*:*:*:*:*:*:*"
                },
                {
                    "referenceCategory": "PACKAGE-MANAGER",
                    "referenceType": "purl",
                    "referenceLocator": "pkg:bitnami/[email protected]?arch=arm64&distro=debian-12"
                }
            ],
            "copyrightText": "NOASSERTION"
        },
        {
            "name": "opt/bitnami/common/bin/render-template",
            "SPDXID": "SPDXRef-Application-4b412cf3f25d2574-render-template",
            "downloadLocation": "NONE",
            "filesAnalyzed": false,
            "primaryPackagePurpose": "APPLICATION",
            "copyrightText": "NOASSERTION",
            "licenseConcluded": "NOASSERTION",
            "licenseDeclared": "NOASSERTION"
        },
        {
            "name": "github.com/aymerick/raymond",
            "SPDXID": "SPDXRef-Package-c77f44f540ae92a0",
            "versionInfo": "v2.0.2+incompatible",
            "supplier": "NOASSERTION",
            "downloadLocation": "NONE",
            "filesAnalyzed": false,
            "sourceInfo": "opt/bitnami/common/package found in: opt/bitnami/common/bin/render-template",
            "licenseConcluded": "NONE",
            "licenseDeclared": "NONE",
            "externalRefs": [
                {
                    "referenceCategory": "PACKAGE-MANAGER",
                    "referenceType": "purl",
                    "referenceLocator": "pkg:golang/github.com/aymerick/[email protected]%2Bincompatible"
                }
            ],
            "primaryPackagePurpose": "LIBRARY",
            "copyrightText": "NOASSERTION"
        },
        (...)
        {
            "name": "github.com/bitnami/render-template",
            "SPDXID": "SPDXRef-Package-8213648cad51225d",
            "supplier": "NOASSERTION",
            "downloadLocation": "NONE",
            "filesAnalyzed": false,
            "sourceInfo": "opt/bitnami/common/package found in: opt/bitnami/common/bin/render-template",
            "licenseConcluded": "NONE",
            "licenseDeclared": "NONE",
            "externalRefs": [
                {
                    "referenceCategory": "PACKAGE-MANAGER",
                    "referenceType": "purl",
                    "referenceLocator": "pkg:golang/github.com/bitnami/render-template"
                }
            ],
            "primaryPackagePurpose": "LIBRARY",
            "copyrightText": "NOASSERTION"
        },
    ],
    "relationships": [
        {
            "spdxElementId": "SPDXRef-render-template",
            "relationshipType": "CONTAINS",
            "relatedSpdxElement": "SPDXRef-Application-4b412cf3f25d2574-render-template"
        },
        {
            "spdxElementId": "SPDXRef-Application-4b412cf3f25d2574-render-template",
            "relatedSpdxElement": "SPDXRef-Package-8213648cad51225d",
            "relationshipType": "CONTAINS"
        },
        (...)
        {
            "spdxElementId": "SPDXRef-Package-8213648cad51225d",
            "relatedSpdxElement": "SPDXRef-Package-c77f44f540ae92a0",
            "relationshipType": "DEPENDS_ON"
        }
}

... we can notice a few things:

  1. The "main" component (name render-template) is a Bitnami package (purl pkg:bitnami/[email protected]?arch=arm64&distro=debian-12)
  2. There's an application (name opt/bitnami/common/bin/render-template) which represents the compiled binary.
  3. There's a package (name github.com/bitnami/render-template) package which is the "main" Golang package (purl pkg:golang/github.com/bitnami/render-template) used in the compiled binary.
  4. There are relationships that describe that render-template (Bitnami pkg) contains opt/bitnami/common/bin/render-template (compiled binary) which contains github.com/bitnami/render-template (golang package)
  5. Other Golang packages (e.g. github.com/aymerick/raymond) are added as dependencies of the "main" Golang package.

If we take a look to the Bitnami Vulnerability Database components (see the link below) we will NOT find any info about the compiled binary nor the golang packages but exclusively about the Bitnami package: render-template.

@juan131
Copy link

juan131 commented Oct 17, 2024

I see two main alternatives here:

  1. Bitnami cataloger just reports Bitnami packages avoiding conflicts with results from other catalogers.
  2. We implement some mechanism that look for duplicates on packages reported by Bitnami cataloger.

Approach 1 vs approach 2 cons/pros:

  • Pros:
    • Easier and simpler to implement.
    • Less error-prone.
  • Cons:
    • Bitnami SBOM might include non-bitnami packages that can't be detected with other cataloger (this is very unlikely).
    • Poorer result when running Syft only with Bitnami cataloger (--select-catalogers bitnami-cataloger).

@willmurphyscode
Copy link
Contributor Author

Bitnami SBOM might include non-bitnami packages that can't be detected with other cataloger (this is very unlikely).

There might be a specific case where this is likely: native binaries (e.g. ELF files) that were not installed by any package manager. Those are currently challenging to identify, so having bitnami weigh in on them makes sense. Especially if we can get high quality CPEs for Grype's binary matcher to compare against NVD's database.

We implement some mechanism that look for duplicates on packages reported by Bitnami cataloger.

Syft already does some de-duplication of packages. If the Bitnami cataloger raises up all these extra packages, are you seeing duplicates? In other words if you scan an image with render-template in it, do you get 2 artifacts for pkg:golang/github.com/bitnami/render-template, one from the Go cataloger and one from the bitnami cataloger? I suspect Syft's existing deduplication may be working here already. Would you mind testing this based on your current PR and letting us know?

Thanks!

@juan131
Copy link

juan131 commented Oct 18, 2024

Hi @willmurphyscode

With the changes I'm proposing at #3341, there are no duplicates. However, this is because I'm only reporting Bitnami packages in the current implementation.

If we report every package in the Bitnami SBOM applying this patch...

diff --git a/syft/pkg/cataloger/bitnami/cataloger.go b/syft/pkg/cataloger/bitnami/cataloger.go
index bfa4d3c2..0e8e0616 100644
--- a/syft/pkg/cataloger/bitnami/cataloger.go
+++ b/syft/pkg/cataloger/bitnami/cataloger.go
@@ -44,13 +44,8 @@ func parseSBOM(_ context.Context, _ file.Resolver, _ *generic.Environment, reade

        var pkgs []pkg.Package
        for _, p := range s.Artifacts.Packages.Sorted() {
-               // We only want to report Bitnami packages
-               if !strings.HasPrefix(p.PURL, "pkg:bitnami") {
-                       continue
-               }
-
                p.FoundBy = catalogerName
-               p.Type = pkg.BitnamiPkg
+
                // replace all locations on the package with the location of the SBOM file.
                // Why not keep the original list of locations? Since the "locations" field is meant to capture
                // where there is evidence of this file, and the catalogers have not run against any file other than,
@@ -59,13 +54,16 @@ func parseSBOM(_ context.Context, _ file.Resolver, _ *generic.Environment, reade
                        reader.Location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation),
                )

-               // Parse the Bitnami-specific metadata
-               metadata, err := parseBitnamiPURL(p.PURL)
-               if err != nil {
-                       return nil, nil, err
-               }
+               if strings.HasPrefix(p.PURL, "pkg:bitnami") {
+                       p.Type = pkg.BitnamiPkg
+                       // Parse the Bitnami-specific metadata
+                       metadata, err := parseBitnamiPURL(p.PURL)
+                       if err != nil {
+                               return nil, nil, err
+                       }

-               p.Metadata = metadata
+                       p.Metadata = metadata
+               }

                pkgs = append(pkgs, p)
        }

... Duplicates appear:

$ go run ./cmd/syft bitnami/apache -o json | jq '.artifacts[] | select(.purl | startswith("pkg:golang/github.com/jessevdk/go-flags"))'

{
  "id": "15bd1508bd27b64e",
  "name": "github.com/jessevdk/go-flags",
  "version": "v1.6.1",
  "type": "go-module",
  "foundBy": "bitnami-cataloger",
  "locations": [
    {
      "path": "/opt/bitnami/common/.spdx-render-template.spdx",
      "layerID": "sha256:6923ab12004885c8d94bdd17626e36e661ddc6f2b159cb48bbfe3681dda3dd0a",
      "accessPath": "/opt/bitnami/common/.spdx-render-template.spdx",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "go",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:jessevdk:go-flags:v1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    },
    {
      "cpe": "cpe:2.3:a:jessevdk:go_flags:v1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    }
  ],
  "purl": "pkg:golang/github.com/jessevdk/[email protected]",
  "metadataType": "go-module-buildinfo-entry",
  "metadata": {
    "goCompiledVersion": "",
    "architecture": ""
  }
}
{
  "id": "2e09194e80f282d7",
  "name": "github.com/jessevdk/go-flags",
  "version": "v1.6.1",
  "type": "go-module",
  "foundBy": "go-module-binary-cataloger",
  "locations": [
    {
      "path": "/opt/bitnami/common/bin/render-template",
      "layerID": "sha256:6923ab12004885c8d94bdd17626e36e661ddc6f2b159cb48bbfe3681dda3dd0a",
      "accessPath": "/opt/bitnami/common/bin/render-template",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "go",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:jessevdk:go-flags:v1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    },
    {
      "cpe": "cpe:2.3:a:jessevdk:go_flags:v1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    }
  ],
  "purl": "pkg:golang/github.com/jessevdk/[email protected]",
  "metadataType": "go-module-buildinfo-entry",
  "metadata": {
    "goCompiledVersion": "go1.22.7",
    "architecture": "arm64",
    "h1Digest": "h1:Cvu5U8UGrLay1rZfv/zP7iLpSHGUZ/Ou68T0iX1bBK4=",
    "mainModule": "github.com/bitnami/render-template"
  }
}

As you can see there are two packages with different "id" and "foundBy" values but almost identical in the rest of fields, except for metadata which is richer on the package reported by "go-module-binary-cataloger".

@juan131
Copy link

juan131 commented Oct 24, 2024

Friendly reminder ⬆️ @willmurphyscode

@willmurphyscode willmurphyscode moved this from Ready to In Review in OSS Oct 29, 2024
@willmurphyscode
Copy link
Contributor Author

Thanks for the ping @juan131!

I want to have a bit of a discussion about deduplication with the binary classifier:

❯ go run ./cmd/syft -q -o json bitnami/postgresql:17 | jq -c '.artifacts[] | select(.name | test("postgres")) | { name: .name, version: .version, location: .locations[0].path }'
{"name":"postgresql","version":"17.0","location":"/opt/bitnami/postgresql/bin/postgres"}
{"name":"postgresql","version":"17.0.0-8","location":"/opt/bitnami/postgresql/.spdx-postgresql.spdx"}

I wonder if it would be preferable or possible to collapse these. The bitnami entry has better version data, and the versions are "compatible" in the sense that bitnami just knows more about the fact that it's postgresql 17.0. I think ideally these would deduplicate, but today they don't because (I believe) the binary classifier sees a slightly less specific version number (it's essentially doing strings | grep looking for version numbers).

CC @wagoodman

@kzantow
Copy link
Contributor

kzantow commented Oct 29, 2024

I think Bitnami should be considered a distro provider like Redhat or Debian, here -- Grype will be using the distro's vulnerability feed and should be matching with the same versioning scheme because we think the information provided to the distro package is the most accurate.

We are deduplicating overlapping packages by owned files based on cataloger priority -- if the bitnami entries are cataloged with a specific type, I think the appropriate type just need to get added here and Syft should deduplicate the packages as long as there is an accurate overlap in the owned files from the SPDX entry.

@wagoodman
Copy link
Contributor

I think Bitnami should be considered a distro provider like Redhat or Debian, here -- Grype will be using the distro's vulnerability feed and should be matching with the same versioning scheme because we think the information provided to the distro package is the most accurate.

Pragmatically I think this is the right short term answer in terms of grype's needs.

In terms of a more accurate SBOM, we could add a post-cataloging task that looks for package ownership overlap with packages found with the SBOM cataloger and keep more authoritative packages (bitnami over anything else for instance).

@juan131
Copy link

juan131 commented Nov 11, 2024

You're the experts here, I can adapt #3341 based on your feedback

@willmurphyscode
Copy link
Contributor Author

Hi @juan131! Thanks for your patience here.

Running #3341 right now on bitnami/postgresql looks like this:

go run ./cmd/syft -q  bitnami/postgresql | grep -e NAME -e postgres
NAME                    VERSION                  TYPE
postgresql              17.1                     binary
postgresql              17.1.0-0                 bitnami

I have a couple questions about this:

  1. What is the version format for 17.1.0-0? (It looks like sem ver with -0 on the end)
  2. Is there location data in the Bitnami SPDX (I didn't see any) or something else that would help us dedupe the two postgres packages that appear here?

In order to merge #3341, we'd like to find a way of collapsing these packages into one package. Right now, by default, if Syft finds for example an RPM that owns the postgres executable and a binary package at the postgres executable, it will, by default, collapse them into one package, deleting the binary detected in favor of the RPM and its richer metadata.

We'd like to do something similar with the bitnami SBOMs, but I don't see any reference to the path in the SDPX bitnami puts in the image, so we're not automatically inferring that the postgres executable path is owned by the bitnami package.

I see that /opt/bitnami/postgresql/bin/postgres is the path to the postgres binary. Is this path format predictable and stable? Maybe we could use it to deduplicate with the binary cataloger?

@juan131
Copy link

juan131 commented Nov 20, 2024

Hi @willmurphyscode

Regarding locations, it's true the SPDX file doesn't include information about the exact location where Bitnami packages (those with pURLs prefixed with pkg:bitnami) are. I remember that, when we implemented this on Trivy, we assumed that if a spdx file was found under /opt/bitnami/COMPONENT_NAME folder then the location was /opt/bitnami/COMPONENT_NAME for every Bitnami package included in the SBOM, see:

What is the version format for 17.1.0-0? (It looks like semver with -0 on the end)

That's the revision, please find at bitnami/go-version our versioning explained. That's why we use this library for version comparison.

@willmurphyscode willmurphyscode self-assigned this Nov 21, 2024
@willmurphyscode
Copy link
Contributor Author

Hi @juan131,

That solution sounds reasonable to me - we already have mechanism for de-duplicating binary packages in favor of OS packages when there's file overlap, and we should use them here.

For example if someone does yum install curl, then the package like pkg:rpm/curl will own the binary file /usr/bin/curl. Syft uses this relationship to remove the binary package curl in favor of the RPM curl, because it will have richer metadata. (This is configurable.) We'd like for #3341 to be changed so that this same configuration would cause the binary package pkg:generic/postgres to be deduplicated in favor of pkg:bitnami/postgresql.

This has a couple of steps:

  1. The Bitnami package struct's Metadata field needs to implement the FileOwner interface, see https://github.com/anchore/syft/blob/main/syft/pkg/file_owner.go#L8 and https://github.com/anchore/syft/blob/main/syft/pkg/rpm.go#L57
  2. The actual cataloger needs to set files on the Metadata so that these can be access later when OwnedFiles is called, see https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/redhat/parse_rpm_db.go#L69-L82 as an example
  3. https://github.com/anchore/syft/blob/main/internal/relationship/exclude_binaries_by_file_ownership_overlap.go#L45-L69 needs to be updated to identify relationships where the parent package is of type bitnami and the child package is of type binary.

Step 2 is where the logic you're describing lives, which I understand to mean:

In the cataloger, if Syft finds an spdx SBOM at /opt/bitnami/COMPONENT/.spdx-COMPONENT.spdx, then we should find in that SBOM a package with a purl like pkg:bitnami/COMPONENT, then we emit a package there named COMPONENT with a purl like pkg:bitnami/COMPONENT that owns all the files under /opt/bitnami/COMPONENT. If this is the logic you mean, it makes sense to us.

As a test case, when running Syft with this change on bitnami/postgresql:latest, Syft should emit a package with a PURL like pkg:bitnami/postgresql but not a binary package called postgres.

@willmurphyscode willmurphyscode moved this from In Review to In Progress in OSS Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

6 participants