-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #2590] Replace gh
in analytics ETL
#3393
base: main
Are you sure you want to change the base?
Conversation
This test used to fail if the username contained a dot (e.g. `first.last`) This commit adjusts the regex to allow usernames with dots
Adds a class to make calls to the Github GraphQL API to replace gh CLI
To analytics.integrations.github.client
After the refactor, we no longer need them
@@ -18,7 +18,6 @@ RUN apt-get update \ | |||
libpq-dev \ | |||
postgresql \ | |||
wget \ | |||
jq \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing jq
because we no longer need it for transformations
# Install gh CLI | ||
# docs: https://github.com/cli/cli/blob/trunk/docs/install_linux.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this script because we no longer need the gh
CLI
@@ -19,6 +19,7 @@ class DBSettings(PydanticBaseEnvConfig): | |||
ssl_mode: str = Field("require", alias="DB_SSL_MODE") | |||
db_schema: str = Field ("app", alias="DB_SCHEMA") | |||
slack_bot_token: str = Field(alias="ANALYTICS_SLACK_BOT_TOKEN") | |||
github_token: str = Field(alias="GH_TOKEN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this because we now need to reference it directly within the codebase, instead of indirectly like we did previously with the gh
CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are in this file, can we rename DBSettings
to something more accurate
########################### | ||
# Do not add these values to this file | ||
# to avoid mistakenly committing them. | ||
# Set these in your shell | ||
# by doing `export ANALYTICS_REPORTING_CHANNEL_ID=whatever` | ||
ANALYTICS_REPORTING_CHANNEL_ID=DO_NOT_SET_HERE | ||
ANALYTICS_SLACK_BOT_TOKEN=DO_NOT_SET_HERE | ||
GH_TOKEN=DO_NOT_SET_HERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevents tests from failing if someone hasn't set their GitHub token locally.
"ANN101", # missing type annotation for self | ||
"ANN102", # missing type annotation for cls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these because they've been removed in the latest version of ruff
@@ -78,7 +76,6 @@ ignore = [ | |||
"PTH123", # `open()` should be replaced by `Path.open()` | |||
"RUF012", # Mutable class attributes should be annotated with `typing.ClassVar` | |||
"TD003", # missing an issue link on TODO | |||
"PT004", # pytest fixture leading underscore - is marked deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This files is basically a complete refactor, but preserves the existing helper functions for the export to prevent this PR from getting bigger than it already is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes this because we no longer need it
@@ -40,7 +40,7 @@ def test_init( | |||
records = caplog.records | |||
assert len(records) == 2 | |||
assert re.match( | |||
r"^start test_logging: \w+ [0-9.]+ \w+, hostname \S+, pid \d+, user \d+\(\w+\)$", | |||
r"^start test_logging: \w+ [0-9.]+ \w+, hostname \S+, pid \d+, user \d+\([\w\.]+\)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this because the tests were failing locally if there was a period in the username, e.g. billy.daly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have (1) significant question about the data formatting, everything else looks fine
@@ -31,14 +31,15 @@ MB_DB_PASS=secret123 | |||
MB_DB_HOST=grants-analytics-db | |||
|
|||
########################### | |||
# Slack Configuration # | |||
# Token Configuration # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Token Configuration # | |
# Secret Configuration # |
{ | ||
"project_owner": owner, | ||
"project_number": project, | ||
"issue_title": safe_pluck(item, "content.title"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need safe_pluck
. If there's a bunch of fields missing, I would rather the code raise a keyerror, instead of getting us bad (eg. mostly null) data.
Summary
Replaces the sub-process call to the
gh
CLI by replacing it with aGitHubGraphqlClient
class that can make calls to the GitHub GraphQL library directly from python.Fixes #2590
Time to review: 10 mins
Changes proposed
GitHubGraphqlClient
class that can make paginated calls to the GitHub GraphQL APIsrc/analytics/etl/github/main.py
with theGitHubGraphqlClient
make-graphql-call.sh
script that previously invoked thegh
CLIContext for reviewers
Instructions to test
make build
make sprint-reports-with-latest-data
Notes
We'll want to refactor the
src/analytics/integrations/github/
sub-package a little bit further pulling most of the code in themain.py
file in that sub-package intosrc/analytics/etl/github.py
instead.I didn't include that in this PR to try to minimize the amount of code I was changing, but we can/should tackle that refactor in #3203 because some of the functions in
main.py
still write to the local file system, but can easily be updated to pass the exported data as a python dictionary.Additional information
The local run of sprint reports with the new code matches the output of the last run triggered by AWS step functions (using code in
main
) posted to slack:Sprint report for HHS/13
In Slack (based on
main
)Locally, based on this feature branch:
Sprint burndown for HHS/17
In Slack (based on
main
)Locally, based on this feature branch:
Deliverable percent complete
In Slack (based on
main
)Locally, based on this feature branch: