Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCOTUS Notes - count reducer not calculating classification count correctly #1252

Open
nciemniak opened this issue Jan 25, 2021 · 0 comments
Labels

Comments

@nciemniak
Copy link
Contributor

Project: SCOTUS Notes: Behind the Scenes at Supreme Court Conference
Project ID: 3462
Workflow: Transcribe Justice Notations
Workflow ID: 4083

This issue intends to document an incident in which the Caesar count reducer incorrectly calculated classification counts. The issue was discovered when we found that not all retired subjects had been imported into ALICE as expected. See the incident doc describing investigation to see details about steps taken during investigation.

The Caesar rules were set up such that when a subject's classification count reached 10 or more, the subject was to be imported into ALICE via an external effect. It was discovered that 10K out of the 30K total retired subjects for the workflow were not imported into ALICE as expected. All of those subjects were retired prior to the initial backfill performed for the project (on 8-14-2020). A sample group of those subjects were investigated, and it was found that all subjects in the sample group had a count reduction that had incorrectly calcuated the classification count for the subject. For example, one of the subjects had a count reduction of {"extracts": 7, "classifications": 7}, when in reality it had 10 classifications, and 10 corresponding extracts. Therefore, the import for that subject was not triggered.

In order to fix these incorrect count reductions and kick off the imports for the subjects that had been missed, we reran the reducers for the workflow, which consequently generated correct count reductions and triggered the import into ALICE. The problem seemed to be related to re-extraction (executed to run new ALICE extractors and reducers) where all expected extracts were created, but were not all accounted for in reductions.

Further investigation is required to understand why exactly Caesar was not able to correcty generate count reductions for this set of subjects; the issue likely lies around race conditions that occur due to extract records being created within less than a second of one another. In order to account for the existance of this issue, it is recommended that you check the count of how many records are imported into ALICE after performing a backfill or a re-reduction.

@nciemniak nciemniak added the bug label Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant