-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parser/preparser validation of empty strings #2748
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #2748 +/- ##
===========================================
+ Coverage 92.79% 92.80% +0.01%
===========================================
Files 202 246 +44
Lines 4591 5576 +985
Branches 320 480 +160
===========================================
+ Hits 4260 5175 +915
- Misses 271 308 +37
- Partials 60 93 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 44 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
@jtimpe getting the following migration-related traceback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtimpe notes below from testing. can you let me know if this is expected behavior?
tested:
- record with correct T3 length (156) and 1 child parsed correctly
- record with T3 length (80) and 1 child did not parse and feedback report said:
Unknown Record_Type was found.
- record with T3 length (148) and 1 child parsed correctly
- record with T3 length (104) and 2 children parsed correctly
- record with M3 length (81) and 1 child parsed 2 records, the 2nd child record is blank. not noted in feedback report. I didnt check the remainder of scenarios for SSP, but it looks like satisfying the SSP Section 1 child records will require more work.
this will probably also be an issue with Tribal TANF section 1 child records (#2742)
@ADPennington thank you! those were some interesting edge cases, so i appreciate the thorough testing. i believe all of the odd behavior you noticed was due to getting partially out-of-bounds substrings from the i pushed up a small change to check the length of the resulting substrings whenever we pull them, if it's less than the length we requested, it gets treated as blank. This may impact cases where, for instance, half the data is populated for the second record and the rest is cut off - no second record will be created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtimpe I tested the behavior of TANF and SSP section 1 file parsing/validation with respect to the length of the record and trailing space-filling and zero-filling. Results are below.
TANF
-
if the
T1
record is the expected length and 0-filled or space-filled, it will be parsed. otherwise, it wont and will result in errors like the following ✔️ :# for T1 Value length 155 does not match 156. Value length 157 does not match 156.
-
if the
T2
record is the expected length, it will be parsed. otherwise, it wont and will result in errors like the following ✔️ :# for T2 Value length 155 does not match 156. Value length 157 does not match 156.
-
if there is only one child in the
T3
record, and the record is the correct length (156) and space-filled, then the record is parsed correctly ✔️ -
if there is only one child in the
T3
record, and the record is the correct length (156) but 0-filled, then 2 child records get parsed, where the 2nd is zero-filled. the error report will flag all of the out of range values due to 0-filling. ✔️# example T3 record with 1 child and 0-filling T320231011111111112120180127WTTTT80W0 222 98 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
-
if there is only one child in the
T3
record, and is not 0-filled or space-filled, the record will parse correctly as long as the 1st child record length == 60 ✔️# example T3 record with 1 child with length == 60 T320231011111111112120180127WTTTT80W0 222 98 00000000
-
if there is only one child in the
T3
record and record length < 60, not parsed, included in error report ✔️
# example of T3 < 60 with error message
T320231011111111112120180127WTTTT80W0 222 98 000000 contains blanks between positions 19 and 60.
-
if there is only one child in the
T3
record and 60 < length < 156 with space-fill, record parsed correctly. ✔️ -
if there is only one child in the
T3
record and record length < 60 or 60 < length < 156 with zero-fill, the 2nd child record is parsed with zeros. error report captures out of range values as expected. ✔️ -
if there are 2 children in the
T3
record and record length == 156 with space-fill or zero-fill, records parsed correctly. ✔️ -
if there are 2 children in the
T3
record and record length < 156 and 2nd child record truncated, only first record parsed, and no error message for 2nd child in record.⚠️ Generate preparser errors when multi-record rows are the wrong length or are missing space-filled second records #2757
# example of T3 record with 2 children, and 2nd child record truncated
T320231011111111151220170525WTTTT@B#Y12222112204399400000000220151113WTTTT9TT#12222122204399
SSP
- results consistent with TANF.
Summary of Changes
Due to the way python slices strings given an out-of-bounds range, the parser was creating blank-string-filled records when a multi-record was not space-filled. Once section 1 validation was introduced in #2518, uncaught exceptions are thrown during parsing
E.g.
T320210400028221R0112014122888175617622222112204398100000000
Creates two records, a properly filled t3 record, and a t3 consisting of
''
values for all char fields (which subsequently fails validation and ends the parser task for that submission)whereas
T320210400028221R0112014122888175617622222112204398100000000
Creates a single, properly filled t3 record with no errors.
I also moved around some functions and classes to resolve circular imports.
How to Test
List the steps to test the PR
These steps are generic, please adjust as necessary.
Deliverables
More details on how deliverables herein are assessed included here.
Deliverable 1: Accepted Features
Checklist of ACs:
lfrohlich
and/oradpennington
confirmed that ACs are met.Deliverable 2: Tested Code
CodeCov Report
comment in PR)CodeCov Report
comment in PR)Deliverable 3: Properly Styled Code
Deliverable 4: Accessible
iamjolly
andttran-hub
using Accessibility Insights reveal any errors introduced in this PR?Deliverable 5: Deployed
Deliverable 6: Documented
Deliverable 7: Secure
Deliverable 8: User Research
Research product(s) clearly articulate(s):