-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3157 - Reparse Meta through model #3220
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3220 +/- ##
===========================================
- Coverage 92.66% 90.85% -1.82%
===========================================
Files 47 299 +252
Lines 1009 8451 +7442
Branches 169 788 +619
===========================================
+ Hits 935 7678 +6743
- Misses 42 660 +618
- Partials 32 113 +81
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 238 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
list_display = [ | ||
'id', | ||
'created_at', | ||
'timeout_at', | ||
'success', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do lose filtering on these fields - we can create a custom filter for properties, future work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtimpe couple questions --
- is this class really only for SSP files?
- is
True
the default value for is_finished and is_success? asking because i just ran the command on a large number of files and its still in the process of deleting records but these are the metrics im seeing ⬇️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, do we still need these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this class really only for SSP files?
it should work for all file types. are you seeing something that specifies SSP only?
is True the default value for is_finished and is_success? asking because i just ran the command on a large number of files and its still in the process of deleting records but these are the metrics im seeing
Yeah, this is interesting. So, under the hood running reparse first deletes all associated records, then creates the new associations. So while the new associations are not-yet-existing, the all
in the ReparseMeta
property might return true. I could update it to check how many file associations there are and respond accordingly.
also, do we still need these
These fields are only used when reparse is run via the management command. we have #3205 which would refactor the action. we can remove the management command and these fields at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this class really only for SSP files?
it should work for all file types. are you seeing something that specifies SSP only?
just this 😄
is True the default value for is_finished and is_success? asking because i just ran the command on a large number of files and its still in the process of deleting records but these are the metrics im seeing
Yeah, this is interesting. So, under the hood running reparse first deletes all associated records, then creates the new associations. So while the new associations are not-yet-existing, the
all
in theReparseMeta
property might return true. I could update it to check how many file associations there are and respond accordingly.
ok. worth noting, that once files were queued for parsing these booleans changed back to False. So a little misleading initially?
also, do we still need these
These fields are only used when reparse is run via the management command. we have #3205 which would refactor the action. we can remove the management command and these fields at that time.
ahh! got it. thank you @jtimpe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ADPennington I pushed a change so that is_finished
and is_success
will stay False
if there are no files queued. I also fixed the misleading docstring 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k i'll re-deploy sometime over the weekend. lots of files currently reparsing in qasp
@jtimpe I checked out develop, parsed 8 files with 8 STTs, reparsed all files twice, checked out this branch, ran migrations, and then see the screenshot below in the admin. I also ran another reparse after checking out the branch. Shouldnt the through model show up as a table now as opposed to the FK? |
@ADPennington Since reparse has not yet run in production, we should consider deploying this PR (as well as #3106) before running reparse in production. This will ensure no data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with 5 Celery workers and reparsed 8 files 10 times to see if I could break anything. Works like a charm, feels faster because we don't lock DB rows anymore, and we get more usable info. LGTM!
thanks for this update @jtimpe . pinging @vlasse86 @lfrohlich @ttran-hub for awareness. what im gathering here is that we need both of these tickets before we can safely reparse in prod. I'm not sure of the timeline just yet to get these through review, but id say both are top priority. |
@jtimpe i think there's a lint error here. no rush on this, ill re-test over the weekend after the latest reparse completes, if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtimpe this is in pretty good shape! my testing notes are below.
- linting error on branch
- I'll need to re-test this to check the status fields
is_finished
andis_success
during the record deletion portion of the reparsing process. - I tested this twice so far:
-
-
1 out of 519 files failed during the re-parse
-
the file that failed still resulted in records being added to the db
-
The results here suggest that
total_num_records_post
will not populate if at least one file failed. Is this true? This would be unexpected, since records from the file before and after this file (in the sequence) were added to the db.
-
-
7924d57
to
5b823bc
Compare
Resolved ✔️
Resolved ✔️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work @jtimpe 🥇
Summary of Changes
Pull request closes #3157
Adds additional information to
ReparseMeta
ReparseMeta
for reporting the per-file information replaces fields that originally stored that info, in favor of the more granular, less-concurrency-error-proneReparseFileMeta
How to Test
develop
. Upload some datafiles. Reparse those data files.Deliverables
More details on how deliverables herein are assessed included here.
Deliverable 1: Accepted Features
Checklist of ACs:
lfrohlich
and/oradpennington
confirmed that ACs are met.Deliverable 2: Tested Code
CodeCov Report
comment in PR)CodeCov Report
comment in PR)Deliverable 3: Properly Styled Code
Deliverable 4: Accessible
iamjolly
andttran-hub
using Accessibility Insights reveal any errors introduced in this PR?Deliverable 5: Deployed
Deliverable 6: Documented
Deliverable 7: Secure
Deliverable 8: User Research
Research product(s) clearly articulate(s):