Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COSI-65, COSI-46, COSI-21: Add GRPC Metrics Instrumentation and Documentation Updates #83

Open
wants to merge 9 commits into
base: improvement/COSI-53-misc-ci-improvements
Choose a base branch
from

Conversation

anurag4DSB
Copy link
Collaborator

@anurag4DSB anurag4DSB commented Jan 2, 2025

The old PR is #69

go.* dependency updates are responsible for 440 line changes.

This PR introduces metrics instrumentation for gRPC API calls and adds comprehensive testing and documentation updates. Key changes include:

  • Instrumentation: Added Prometheus metrics for gRPC API calls.
  • Testing:
    • Updated unit tests for the run method to include Prometheus registry.
    • Added unit tests for metrics and server packages.
    • Enhanced e2e tests and Helm deployments to validate metrics functionality.
  • Helm & Kustomize: Added services for exposing metrics in Helm chart and Kustomize configurations.
  • Documentation:
    • Updated local development documentation to include metrics configuration.
    • Added documentation for driver parameters and metrics endpoints.
    • Dependency Management: Ran go mod tidy to ensure dependencies are up to date.

These changes provide better observability and make it easier to monitor the driver’s performance and behavior in production environments.

driverPrefix = flag.String("driver-prefix", defaultDriverPrefix, "prefix for COSI driver, e.g. <prefix>.scality.com, default cosi.scality.com")
driverMetricsAddress = flag.String("driver-metrics-address", defaultMetricsAddress, "The address to expose Prometheus metrics, default: :8080")
driverMetricsPath = flag.String("driver-metrics-path", defaultMetricsPath, "path for the metrics endpoint, default: /metrics")
driverMetricsPrefix = flag.String("driver-custom-metrics-prefix", defaultMetricsPrefix, "prefix for the metrics, default: scality_cosi_driver_")
Copy link
Collaborator Author

@anurag4DSB anurag4DSB Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now a place holder, will be used for S3 and IAM metrics

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 91.48936% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.67%. Comparing base (9c4e7f0) to head (e4b65e3).

Files with missing lines Patch % Lines
pkg/metrics/metrics.go 85.71% 2 Missing and 1 partial ⚠️
pkg/grpcfactory/server.go 96.15% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
pkg/grpcfactory/server.go 94.23% <96.15%> (+10.89%) ⬆️
pkg/metrics/metrics.go 85.71% <85.71%> (ø)
Components Coverage Δ
🏠 Main Package ∅ <ø> (∅)
🚗 Driver Package 92.22% <ø> (ø)
📡 gRPC Factory Package 86.25% <96.15%> (+4.60%) ⬆️
🔐 IAM Client Package 100.00% <ø> (ø)
🌐 S3 Client Package 100.00% <ø> (ø)
🔧 Util Package 100.00% <ø> (ø)
🔖 Constants Package ∅ <ø> (∅)
@@                             Coverage Diff                              @@
##           improvement/COSI-53-misc-ci-improvements      #83      +/-   ##
============================================================================
+ Coverage                                     93.40%   93.67%   +0.26%     
============================================================================
  Files                                             9       10       +1     
  Lines                                           637      680      +43     
============================================================================
+ Hits                                            595      637      +42     
- Misses                                           36       37       +1     
  Partials                                          6        6              

@anurag4DSB anurag4DSB force-pushed the feature/COSI-65-instrument-cosi-drover-with-gprc-metrics branch from 1f69f34 to 87f52de Compare January 2, 2025 11:23
@anurag4DSB anurag4DSB force-pushed the feature/COSI-65-instrument-cosi-drover-with-gprc-metrics branch from 87f52de to f5d58e9 Compare January 2, 2025 11:34
@anurag4DSB anurag4DSB marked this pull request as ready for review January 2, 2025 11:39
@anurag4DSB anurag4DSB force-pushed the feature/COSI-65-instrument-cosi-drover-with-gprc-metrics branch from e48db12 to e4b65e3 Compare January 2, 2025 12:35
@anurag4DSB anurag4DSB changed the title COSI-65: Metrics Intrumentation + e2e tests COSI-65: Add GRPC Metrics Instrumentation and Documentation Updates Jan 2, 2025
Comment on lines +12 to +17
var (
S3RequestsTotal *prometheus.CounterVec
S3RequestDuration *prometheus.HistogramVec
IAMRequestsTotal *prometheus.CounterVec
IAMRequestDuration *prometheus.HistogramVec
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place holders for s3 and iam metrics

Comment on lines +82 to +106
var _ = Describe("InitializeMetrics", func() {
var (
registry *prometheus.Registry
driverMetricsPath string
)

BeforeEach(func() {
registry = prometheus.NewRegistry()
driverMetricsPath = "/metrics"
})

It("should serve metrics via an HTTP endpoint", func() {
addr := "127.0.0.1:0"
server, err := metrics.StartMetricsServerWithRegistry(addr, registry, driverMetricsPath)
Expect(err).NotTo(HaveOccurred())
Expect(server).NotTo(BeNil())

resp, err := http.Get("http://" + server.Addr + driverMetricsPath)
Expect(err).NotTo(HaveOccurred())
Expect(resp.StatusCode).To(Equal(http.StatusOK))

err = server.Close()
Expect(err).NotTo(HaveOccurred())
})
})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place holder tests for initialize metrics for s3 and iam

@anurag4DSB anurag4DSB changed the title COSI-65: Add GRPC Metrics Instrumentation and Documentation Updates COSI-65, COSI-46, COSI-21: Add GRPC Metrics Instrumentation and Documentation Updates Jan 2, 2025
Comment on lines +31 to +32
log_and_run() {
echo "Running: $*" | tee -a "$LOG_FILE"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I commented this already, but wouldn't a set -x be simpler to print command as well ?

Then the whole script could be tee into a logfile

Comment on lines +43 to +44
# Wait a few seconds to ensure port-forward is established
log_and_run sleep 5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test the port here.

# wait for $localport to become available
while ! nc -vz localhost $localport > /dev/null 2>&1 ; do
    # echo sleeping
    sleep 0.1
done

# This would show that the port is open
# nmap -sT -p $localport localhost

https://stackoverflow.com/a/68824178

@@ -116,6 +116,14 @@ jobs:
run: |
.github/scripts/e2e_tests_brownfield_use_case.sh

# the script accepts number of requests for APIs for CREATE_BUCKET, DELETE_BUCKET, GET_INFO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# the script accepts number of requests for APIs for CREATE_BUCKET, DELETE_BUCKET, GET_INFO
# the script accepts number of requests for APIs for CREATE_BUCKET, DELETE_BUCKET, GET_INFO

or is there a missing word ?

Comment on lines +121 to +122
# Example below we we are testing for 2 CREATE_BUCKET, 1 DELETE_BUCKET,
# 1 GET_INFO, 2 GRANT_ACCESS and 2 REVOKE_ACCESS API counts
Copy link

@BourgoisMickael BourgoisMickael Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we we + use list it's easier to read

Suggested change
# Example below we we are testing for 2 CREATE_BUCKET, 1 DELETE_BUCKET,
# 1 GET_INFO, 2 GRANT_ACCESS and 2 REVOKE_ACCESS API counts
# Example below we are testing for those API counts:
# - 2 CREATE_BUCKET
# - 1 DELETE_BUCKET
# - 1 GET_INFO
# - 2 GRANT_ACCESS
# - 2 REVOKE_ACCESS

@@ -72,6 +72,14 @@ jobs:
run: |
.github/scripts/verify_helm_install.sh

# the script accepts number of requests for APIs for CREATE_BUCKET, DELETE_BUCKET, GET_INFO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as previous file

defaultDriverAddress = "unix:///var/lib/cosi/cosi.sock"
defaultDriverPrefix = "cosi"
defaultMetricsPath = "/metrics"
defaultMetricsPrefix = "scality_cosi_driver"
Copy link

@BourgoisMickael BourgoisMickael Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaultMetricsPrefix = "scality_cosi_driver"
defaultMetricsPrefix = "scality_cosi_driver_"

And in the documentation md file as well
or the trailing underscore should be removed from the flag.String

driverAddress = flag.String("driver-address", "unix:///var/lib/cosi/cosi.sock", "driver address for the socket")
driverPrefix = flag.String("driver-prefix", "", "prefix for COSI driver, e.g. <prefix>.scality.com")
driverAddress = flag.String("driver-address", defaultDriverAddress, "driver address for the socket file, default: unix:///var/lib/cosi/cosi.sock")
driverPrefix = flag.String("driver-prefix", defaultDriverPrefix, "prefix for COSI driver, e.g. <prefix>.scality.com, default cosi.scality.com")
Copy link

@BourgoisMickael BourgoisMickael Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
driverPrefix = flag.String("driver-prefix", defaultDriverPrefix, "prefix for COSI driver, e.g. <prefix>.scality.com, default cosi.scality.com")
driverPrefix = flag.String("driver-prefix", defaultDriverPrefix, "prefix for COSI driver, e.g. <prefix>.scality.com, default: cosi")


## Additional Resource

- [gRPC-Go Prometheus Metrics](https://github.com/grpc-ecosystem/go-grpc-prometheus)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [gRPC-Go Prometheus Metrics](https://github.com/grpc-ecosystem/go-grpc-prometheus)
- [gRPC-Go Prometheus Metrics](https://github.com/grpc-ecosystem/go-grpc-middleware)

Change link to the deprecated lib


srvMetrics := grpcprom.NewServerMetrics(
grpcprom.WithServerHandlingTimeHistogram(
grpcprom.WithHistogramBuckets([]float64{0.001, 0.01, 0.1, 0.3, 0.6, 1, 3, 6, 9, 20, 30, 60, 90, 120}),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that histogram can greatly increase cardinality.

If this runs on client side outside our control, maybe it could be interesting to have those buckets configurable so they can change easily is they want more or less cardinality or if they want to target more specifically a certain range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants