SCOTUS Notes - count reducer not calculating classification count correctly #1252

nciemniak · 2021-01-25T19:49:31Z

Project: SCOTUS Notes: Behind the Scenes at Supreme Court Conference
Project ID: 3462
Workflow: Transcribe Justice Notations
Workflow ID: 4083

This issue intends to document an incident in which the Caesar count reducer incorrectly calculated classification counts. The issue was discovered when we found that not all retired subjects had been imported into ALICE as expected. See the incident doc describing investigation to see details about steps taken during investigation.

The Caesar rules were set up such that when a subject's classification count reached 10 or more, the subject was to be imported into ALICE via an external effect. It was discovered that 10K out of the 30K total retired subjects for the workflow were not imported into ALICE as expected. All of those subjects were retired prior to the initial backfill performed for the project (on 8-14-2020). A sample group of those subjects were investigated, and it was found that all subjects in the sample group had a count reduction that had incorrectly calcuated the classification count for the subject. For example, one of the subjects had a count reduction of {"extracts": 7, "classifications": 7}, when in reality it had 10 classifications, and 10 corresponding extracts. Therefore, the import for that subject was not triggered.

In order to fix these incorrect count reductions and kick off the imports for the subjects that had been missed, we reran the reducers for the workflow, which consequently generated correct count reductions and triggered the import into ALICE. The problem seemed to be related to re-extraction (executed to run new ALICE extractors and reducers) where all expected extracts were created, but were not all accounted for in reductions.

Further investigation is required to understand why exactly Caesar was not able to correcty generate count reductions for this set of subjects; the issue likely lies around race conditions that occur due to extract records being created within less than a second of one another. In order to account for the existance of this issue, it is recommended that you check the count of how many records are imported into ALICE after performing a backfill or a re-reduction.

The text was updated successfully, but these errors were encountered:

nciemniak added the bug label Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCOTUS Notes - count reducer not calculating classification count correctly #1252

SCOTUS Notes - count reducer not calculating classification count correctly #1252

nciemniak commented Jan 25, 2021

SCOTUS Notes - count reducer not calculating classification count correctly #1252

SCOTUS Notes - count reducer not calculating classification count correctly #1252

Comments

nciemniak commented Jan 25, 2021