Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log-Receiver: Implementation #3568

Closed
2 tasks done
Tracked by #3566
Rotfuks opened this issue Jul 10, 2024 · 13 comments
Closed
2 tasks done
Tracked by #3566

Log-Receiver: Implementation #3568

Rotfuks opened this issue Jul 10, 2024 · 13 comments
Assignees
Labels
team/atlas Team Atlas

Comments

@Rotfuks
Copy link
Contributor

Rotfuks commented Jul 10, 2024

Motivation

In order to enable customers to receive logs also from outside the installations we have to get our hands dirty and implement the thing we created a concept for in the investigation story.

Todo

  • Implement the solution discussed in the investigation story
  • TODO // Add more details after investigation closed

Outcome

  • We have a working log receiver which can receive logs from different data sources outside of the installations.
@QuentinBisson
Copy link

Coming from the investigation here #3567 the implementation now is to:

  • Deploy an alloy instance to act as an open-telemetry collector (alloy-gateway) enabling OLTP http endpoint only for now (https://grafana.com/docs/alloy/latest/reference/components/otelcol/otelcol.receiver.otlp/). The ingress could be named oltp.observability.<installation base domain> for instance.
    The logs needs to have a tenant defined, otherwise they need to be rejected.
  • Once the logs are received, the gateway sends the received logs to loki directly.
  • We will probably have to create a datasource by hand for those logs though as they would not pass through the multi-tenant gateway.

Regarding security, after some discussions with Zach at the onsite, considering that we cannot have workload identity now, we could make sure the ingress uses OIDC with the customer's SSO so they would have to make sure their app as the permission to write to our endpoint :)

image

@giantswarm/team-atlas this is a lot easier than the original implementation and also more secure than API keys. Are you fine with it so we can start the implementation?

@QuentinBisson
Copy link

QuentinBisson commented Sep 17, 2024

Waiting for initial PRs to be approved:

@QuentinBisson
Copy link

QuentinBisson commented Sep 19, 2024

Well, collections do not allow us to deploy an app with a different name https://gigantic.slack.com/archives/C02GDJJ68Q1/p1726752552645969

@QuentinBisson
Copy link

Blocked by #3682

@QuentinBisson
Copy link

Let's unblock us with this hack for now https://github.com/giantswarm/alloy-gateway-app

This application actually deploys an App Cr named observability-gateway which is actually an instance of Alloy the same way alloy-rules is deployed in prometheus-rules

@QuentinBisson
Copy link

@QuentinBisson
Copy link

QuentinBisson commented Oct 2, 2024

Current configuration PR is here https://github.com/giantswarm/shared-configs/pull/158

This is being tested on grizzly but the X-Scope-OrgID header is currently not being picked up and I think it's because the feature is not built into alloy yet grafana/alloy#1805.

This does not really prevents us from enabling the gateway if we set a random tenant like external for all external logs using the following stage:

loki.process "tenant" {
    stage.tenant {
        value = "external"
    }
}

@QuentinBisson
Copy link

QuentinBisson commented Oct 3, 2024

Alright so I managed to make this work by using I don't know how many hacks :D

Gateway configuration

    alloy:
      enabled: true
      alloy:
        configMap:
          create: true
          content: |
            loki.write "local" {
                endpoint {
                    url = "http://loki-gateway.loki.svc/loki/api/v1/push"
                }
            }

            loki.echo "example" { }

            loki.process "tenant" {
                forward_to = [
                    loki.echo.example.receiver,
                    loki.write.local.receiver,
                ]

                stage.tenant {
                    value = "external"
                }
            }

            loki.source.api "loki_push_api" {
                http {
                    listen_address = "0.0.0.0"
                    listen_port = 3100
                }
                use_incoming_timestamp = true
                forward_to = [
                    loki.process.tenant.receiver,
                ]
                labels = {
                  forwarded = "true",
                }
            }
        extraPorts:
        - name: "loki-api"
          port: 3100
          targetPort: 3100
          protocol: "TCP"

      controller:
        type: 'deployment'
        autoscaling:
          enabled: true
      # The gateway does not need pods logs
      crds:
        create: false
      ingress:
        enabled: true
        ingressClassName: nginx
        annotations:
          cert-manager.io/cluster-issuer: letsencrypt-giantswarm
          nginx.ingress.kubernetes.io/auth-signin: https://$host/oauth2/start?rd=$escaped_request_uri
          nginx.ingress.kubernetes.io/auth-url: https://$host/oauth2/auth
          # Ensure requests have the X-Scope-OrgID header set
          nginx.ingress.kubernetes.io/configuration-snippet: |
            if ($http_x_scope_orgid = "") {
              return 401;
            }
            add_header X-Scope-OrgID $http_x_scope_orgid;
        hosts:
        - gateway.observability.grizzly.gaws.gigantic.io
        extraPaths:
        - path: /loki/api/v1/push
          pathType: Prefix
          backend:
            service:
              name: observability-gateway-alloy
              port:
                name: "loki-api"
        tls:
        - hosts:
          - gateway.observability.grizzly.gaws.gigantic.io
          secretName: tls-certificate-observability-gateway
    networkPolicy:
      cilium:
	egress:
        - toEntities:
          - kube-apiserver
          - cluster
        ingress:
        - fromEntities:
          - cluster
          - world

Oauth2-proxy

It needs to be redeployed ...

Config (I'm not sure what all those fields even do though)

  values: |
    oauth2Proxy:
      extraEnv:
      - name: 'OAUTH2_PROXY_EMAIL_DOMAINS'
        value: '*'
      - name: 'OAUTH2_PROXY_PROVIDER_DISPLAY_NAME'
        value: 'Dex'
      - name: 'OAUTH2_PROXY_SKIP_PROVIDER_BUTTON'
        value: 'true'
      - name: 'OAUTH2_PROXY_SKIP_JWT_BEARER_TOKENS'
        value: 'true'
      - name: 'OAUTH2_PROXY_SET_AUTHORIZATION_HEADER'
        value: 'true'
      - name: 'OAUTH2_PROXY_SET_XAUTHREQUEST'
        value: 'true'
      - name: 'OAUTH2_PROXY_PASS_ACCESS_TOKEN'
        value: 'true'
      - name: 'OAUTH2_PROXY_PASS_AUTHORIZATION_HEADER'
        value: 'true'
      - name: 'OAUTH2_PROXY_COOKIE_NAME'
        value: SESSION
    ingress:
      enabled: true
      hosts:
        - gateway.observability.grizzly.gaws.gigantic.io

Dex

Dex configuration needs to have a new redirect URI configured for hte gateway, this will require some changes in mc-bootstrap and in configs to change the secret value ...

A little hack to be able to generate a token with dex https://mac-blog.org.ua/dex-between-services/ (edit the dex secret in the giantswarm namespace)

Alloy

Upstream PR has been merged but we will need to wait for the helm chart to be released upstream before we can really provide it.

Useful link: https://developer.okta.com/blog/2022/07/14/add-auth-to-any-app-with-oauth2-proxy

@Rotfuks
Copy link
Contributor Author

Rotfuks commented Oct 8, 2024

Great Job! That was a beast of a story, huh?
Are we happy with the result, even when it's a bit hacky, or should we discuss if this actually meets our quality gate and if it's leading to lots of maintaining pain in the future?

@QuentinBisson
Copy link

QuentinBisson commented Oct 8, 2024 via email

@QuentinBisson
Copy link

QuentinBisson commented Oct 21, 2024

Current wip:

We are waiting on customer feedback to know if they would be able to support oauth. If they are, we will need to change the oauth2-proxy config towards the customer's OIDC provider @Rotfuks any news on this?

@QuentinBisson
Copy link

Alloy released upstream and in the gateway giantswarm/alloy-gateway-app#12 (review)

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/atlas Team Atlas
Projects
Archived in project
Development

No branches or pull requests

2 participants