Skip to content

Commit

Permalink
Merge pull request #1131 from wellcomecollection/migrate-ingest-inspe…
Browse files Browse the repository at this point in the history
…ctor

Migrate Ingest Inspector to AWS
  • Loading branch information
StepanBrychta authored Jul 3, 2024
2 parents 5546e39 + 70a92d4 commit 27e04f7
Show file tree
Hide file tree
Showing 38 changed files with 6,714 additions and 0 deletions.
169 changes: 169 additions & 0 deletions monitoring/ingest_inspector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Wellcome Ingest Inspector

With the ingest inspector, users can track the progress of ingests in the [Wellcome storage service](https://github.com/wellcomecollection/storage-service).

Saving a new package to the storage service is a multi-step process called an **ingest**.
When you ask the service to store a new package, you get an **ingest ID**.
You can use the ingest ID to track the progress of your package -- see whether it's been successfully stored, or how far through the process it is.

![Screenshot of the ingest inspector](screenshots/screenshot.png)

- [Motivation](#motivation)
- [Features](#features)
- [Usage](#usage)
- [Development](#development)

## Motivation

The storage service has a JSON API which is protected by OAuth.
Providing read-only access to the ingests data in a web app is easier than doing OAuth negotiation yourself.

The ingests data isn't sensitive, so we don't need authentication in front of it.
It mostly contains user-facing messages from the storage service, or an identifier used by one of Wellcome's catalogue databases.
The user-facing messages can be seen in the publicly-available source code, and the identifiers don't encode any sensitive information.



## Features

The app has a number of features designed to make it as easy as possible for somebody to get the data they need.

* **It doesn't ask you to choose between prod and staging API.**

We run two separate instances of the storage service: production ("prod") and staging.
They both use UUIDs for ingest IDs, so a given ingest ID should refer to a unique ingest across the two services.

Rather than asking users to specify whether they want to query the prod or staging APIs, the app queries *both* APIs and returns the appropriate ingest.

The app has a different visual appearance, depending on whether an ingest is from the prod or staging API:

![Two ingests. One says "Found ingest [ID] in the production API", the other "Found ingest [ID] in the staging API". The prod API has a solid green background; the staging API has green text on a white background.](screenshots/prod_staging_api.png)

* **It uses colours to distinguish the state of an ingest.**

An ingest can be in one of four states: *accepted*, *processing*, *succeeded* or *failed*.

The app changes its tint colour to reflect the state of an ingest, so you get an instant visual indication before you start reading the text:

![Four ingests, highlighted yellow (accepted), blue (processing), red (failed), green (processing).](screenshots/ingest_state_colours.png)

* **It displays timestamps with limited precision.**

The storage service records timestamps with millisecond precision.
This is useful for debugging, but is usually too much detail for humans.
The app only ever displays HH:MM precision, for example:

> Tue, 21 July 2020 @ 13:27
The original timestamp is in the `title` attribute, so you can find it by hovering -- but I'm not sure anyone has ever used this.

* **It displays localised times, rather than UTC.**

The storage service records timestamps in UTC, the one true timezone.
This is useful for coordinating across different machines, but less readable for humans.

For example, the UK runs an hour ahead of UTC during the summer.
If you look at the UTC timestamps, it can seem like the last event was an hour ago.

The app detects your browser locale, and adjusts the timezone to match, for example:

> Wed, 30 September 2020 @ 16:18 BST
It includes the timezone to remind you that this is a localised time.
The original timestamp in the `title` attribute is the UTC timestamp returned by the storage service.

* **It includes hints to tell you if a timestamp is recent.**

A common question is: *has this ingest still being processed, or has it got stuck?*

The app highlights the most recent event, and includes some hints in the timestamp to tell you if it was recent:

> today @ 10:21 BST (just now)<br/>
> today @ 10:20 BST (30 seconds ago)<br/>
> today @ 10:16 BST (5 minutes ago)<br/>
> yesterday @ 9:21 BST
This means the user doesn't need to know the exact date and time to know if an ingest was updated recently.

(You can read about how this works in [Showing human-friendly dates in JavaScript](https://alexwlchan.net/2020/05/human-friendly-dates-in-javascript/).)

* **It links to relevant objects/prefixes in the S3 console.**

The storage service works with objects in S3: that's where you upload a package you want it to store, and that's where it stores two of the three replicas of every package.
The app includes links to the original object you uploaded, and where it's stored the S3 copies, so you can jump straight to them.

This is useful if, say, the storage service can't find the package you've uploaded.
You can use the link to jump straight to the object.

(The Azure Blob Storage portal doesn't allow linking directly to a prefix, or there'd be a link to the Azure replica as well.)

* **It highlights failed ingest events, and links to relevant logs.**

An ingest can have many events -- in some cases over 50 -- but they follow a predictable pattern.
This means the app can highlight failures, making them easier to pick out:

![Six lines. One line "register failed" is in bold text and has a link to "dev logs".](screenshots/ingest_errors.png)

It can also recognise which app sent a given failure event.
The "dev logs" link goes to a pre-filled query in our logging service, which filters for the right app and approximate time when this event was sent, making it easy to debug.

* **It remembers what ingests you looked at recently.**

Often you'll want to look up the same ingest multiple times in quick succession, to track its progress through the day.

To save you looking up the ingest ID each time, the app remembers the last ten ingests you looked that, and presents a list of them on the front page:

![A list of ingest IDs.](screenshots/recent_ingests.png)

(More specifically, it remembers the last ten ingests looked up by a given browser.)

* **It highlights pending callbacks.**

An ingest is marked as "succeeded" when there are three, verified copies of the original package in the long-term storage, and the package has been registered.

Once it succeeds (or fails), the storage service can optionally make a callback to tell another service that it's done.
For example, our digitised workflow system waits for a callback rather than polling the ingests API repeatedly.

If an ingest has succeeded, but the storage service has a pending callback, this gets a visual highlight:

![A line "callback status:" pending highlighted in blue.](screenshots/pending_callback.png)



## Usage

You can use the app by visiting <https://d1dti1kbjcy26.cloudfront.net>.

## Development
### Backend

The backend consists of a single Lambda function, which fetches ingests from the Wellcome Storage Service. To run the Lambda function code locally, navigate into the `ingest_inspector/backend_lambda` directory and run:

```shell
AWS_PROFILE=storage-developer python3 src/ingest_inspector_backend.py --ingest-id=<INGEST_ID>
```

### Frontend

To install dependencies and start the frontend development server, navigate into the `ingest_inspector/frontend` directory and run:

```shell
npm install
npm run dev
```

### Deployments

To upload a new version of the backend Lambda function into S3, run:

```shell
AWS_PROFILE=storage-developer python3 ./builds/publish_lambda_zip.py monitoring/ingest_inspector/backend_lambda --bucket=wellcomecollection-storage-infra --key=lambdas/monitoring/ingest_inspector_backend.zip
```

The frontend is hosted as a static website on S3 and served via CloudFront. To build and deploy the app, navigate into
the `ingest_inspector/frontend` directory and run:

```shell
sh deploy.sh
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
import argparse
import json
import functools

import boto3
from wellcome_storage_service import (
RequestsOAuthStorageServiceClient,
IngestNotFound,
UserError,
)
from utils import (
tally_event_descriptions,
get_s3_url,
get_display_s3_url,
get_last_updated_date,
)

STAGING_URL = "https://api-stage.wellcomecollection.org/storage/v1"
PRODUCTION_URL = "https://api.wellcomecollection.org/storage/v1"


def _client_from_environment(api_url):
secretsmanager = boto3.Session().client("secretsmanager")

def _get_secretsmanager_value(secret_id: str):
return secretsmanager.get_secret_value(SecretId=secret_id)["SecretString"]

client_id = _get_secretsmanager_value("ingest-inspector/cognito-client-id")
client_secret = _get_secretsmanager_value("ingest-inspector/cognito-client-secret")

return RequestsOAuthStorageServiceClient(
api_url=api_url,
client_id=client_id,
client_secret=client_secret,
token_url="https://auth.wellcomecollection.org/oauth2/token",
)


@functools.lru_cache()
def get_prod_client():
return _client_from_environment(PRODUCTION_URL)


@functools.lru_cache()
def get_staging_client():
return _client_from_environment(STAGING_URL)


def get_ingest(ingest_id: str):
try:
ingest = get_prod_client().get_ingest(ingest_id=ingest_id)
environment = "production"
except IngestNotFound:
ingest = get_staging_client().get_ingest(ingest_id=ingest_id)
environment = "staging"

return ingest, environment


def lambda_handler(event, context):
print(f"Starting lambda_handler, got event: {event}")

ingest_id = event["pathParameters"]["ingest_id"]

try:
ingest, environment = get_ingest(ingest_id)
except IngestNotFound:
return {"statusCode": 404, "body": json.dumps({"message": "Ingest not found."})}
except UserError:
return {
"statusCode": 400,
"body": json.dumps({"message": "Invalid ingest ID."}),
}

ingest["events"] = tally_event_descriptions(ingest["events"], environment)
ingest["s3Url"] = get_s3_url(ingest["sourceLocation"])
ingest["displayS3Url"] = get_display_s3_url(ingest["sourceLocation"])
ingest["lastUpdatedDate"] = get_last_updated_date(ingest)
ingest["environment"] = environment

return {"statusCode": 200, "body": json.dumps(ingest)}


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Retrieve a single ingest from the Wellcome Storage Service by its ID."
)
parser.add_argument(
"--ingest-id",
type=str,
help="The ID of the ingest to retrieve.",
required=True,
)
args = parser.parse_args()

event = {"pathParameters": {"ingest_id": args.ingest_id}}
print(lambda_handler(event, None))
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
boto3
requests
wellcome_storage_service>=2.1.0,<3
42 changes: 42 additions & 0 deletions monitoring/ingest_inspector/backend_lambda/src/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile
#
boto3==1.23.10
# via -r requirements.in
botocore==1.26.10
# via
# boto3
# s3transfer
certifi==2024.2.2
# via requests
charset-normalizer==2.0.12
# via requests
idna==3.6
# via requests
jmespath==0.10.0
# via
# boto3
# botocore
oauthlib==3.2.2
# via requests-oauthlib
python-dateutil==2.9.0.post0
# via botocore
requests==2.27.1
# via
# -r requirements.in
# requests-oauthlib
requests-oauthlib==1.3.1
# via wellcome-storage-service
s3transfer==0.5.2
# via boto3
six==1.16.0
# via python-dateutil
urllib3==1.26.18
# via
# botocore
# requests
wellcome-storage-service==2.3.4
# via -r requirements.in
Loading

0 comments on commit 27e04f7

Please sign in to comment.