Fix parser/preparser validation of empty strings #2748

jtimpe · 2023-11-14T16:25:02Z

Summary of Changes

Due to the way python slices strings given an out-of-bounds range, the parser was creating blank-string-filled records when a multi-record was not space-filled. Once section 1 validation was introduced in #2518, uncaught exceptions are thrown during parsing

E.g.
T320210400028221R0112014122888175617622222112204398100000000
Creates two records, a properly filled t3 record, and a t3 consisting of '' values for all char fields (which subsequently fails validation and ends the parser task for that submission)

whereas
T320210400028221R0112014122888175617622222112204398100000000
Creates a single, properly filled t3 record with no errors.

I also moved around some functions and classes to resolve circular imports.

How to Test

List the steps to test the PR
These steps are generic, please adjust as necessary.

cd tdrs-backend && docker-compose up
cd tdrs-frontend && docker-compose up --build

Open http://localhost:3000/ and sign in.
Upload a file with the row included above, with and without the trailing space-fill
Ensure the resulting record, status, and error report are the same for both submissions.

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

[insert ACs here]
lfrohlich and/or adpennington confirmed that ACs are met.

Deliverable 2: Tested Code

Are all areas of code introduced in this PR meaningfully tested?
- If this PR introduces backend code changes, are they meaningfully tested?
- If this PR introduces frontend code changes, are they meaningfully tested?
Are code coverage minimums met?
- Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
- Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

Are backend code style checks passing on CircleCI?
Are frontend code style checks passing on CircleCI?
Are code maintainability principles being followed?

Deliverable 4: Accessible

Does this PR complete the epic?
Are links included to any other gov-approved PRs associated with epic?
Does PR include documentation for Raft's a11y review?
Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

Does this PR provide background for why coding decisions were made?
If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
If this PR introduces dependencies, are their licenses documented?
Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

Does the OWASP Scan pass on CircleCI?
Do manual code review and manual testing detect any new security issues?
If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

the purpose of the research
methods used to conduct the research
who participated in the research
what was tested and how
impact of research on TDP
(if applicable) final design mockups produced for TDP development

codecov · 2023-11-14T16:41:52Z

Codecov Report

Merging #2748 (9c98abd) into develop (a832605) will increase coverage by 0.01%.
Report is 3 commits behind head on develop.
The diff coverage is 100.00%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2748      +/-   ##
===========================================
+ Coverage    92.79%   92.80%   +0.01%     
===========================================
  Files          202      246      +44     
  Lines         4591     5576     +985     
  Branches       320      480     +160     
===========================================
+ Hits          4260     5175     +915     
- Misses         271      308      +37     
- Partials        60       93      +33

Flag	Coverage Δ
dev-backend	`92.80% <100.00%> (+0.01%)`	⬆️
dev-frontend	`92.83% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
tdrs-backend/tdpservice/parsers/fields.py	`86.04% <100.00%> (-0.32%)`	⬇️
tdrs-backend/tdpservice/parsers/row_schema.py	`92.94% <100.00%> (+0.08%)`	⬆️
tdrs-backend/tdpservice/parsers/validators.py	`96.64% <100.00%> (+0.19%)`	⬆️

... and 44 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb16d8b...9c98abd. Read the comment docs.

ADPennington · 2023-11-24T15:01:07Z

@jtimpe getting the following migration-related traceback:

09:54:52.867: [APP/PROC/WEB.0] Traceback (most recent call last):
09:54:52.867: [APP/PROC/WEB.0] File "/home/vcap/app/manage.py", line 31, in <module>
09:54:52.867: [APP/PROC/WEB.0] main()
09:54:52.867: [APP/PROC/WEB.0] File "/home/vcap/app/manage.py", line 27, in main
09:54:52.868: [APP/PROC/WEB.0] execute_from_command_line(sys.argv)
09:54:52.868: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
09:54:52.868: [APP/PROC/WEB.0] utility.execute()
09:54:52.868: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/__init__.py", line 413, in execute
09:54:52.868: [APP/PROC/WEB.0] self.fetch_command(subcommand).run_from_argv(self.argv)
09:54:52.868: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/base.py", line 354, in run_from_argv
09:54:52.869: [APP/PROC/WEB.0] self.execute(*args, **cmd_options)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/base.py", line 398, in execute
09:54:52.869: [APP/PROC/WEB.0] output = self.handle(*args, **options)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/base.py", line 89, in wrapped
09:54:52.869: [APP/PROC/WEB.0] res = handle_func(*args, **kwargs)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 95, in handle
09:54:52.869: [APP/PROC/WEB.0] executor.loader.check_consistent_history(connection)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/db/migrations/loader.py", line 306, in check_consistent_history
09:54:52.869: [APP/PROC/WEB.0] raise InconsistentMigrationHistory(
09:54:52.869: [APP/PROC/WEB.0] django.db.migrations.exceptions.InconsistentMigrationHistory: Migration search_indexes.0021_ssp_m7 is applied before its dependency search_indexes.0020_ssp_m4_ssp_m5 on database 'default'.

ADPennington

@jtimpe notes below from testing. can you let me know if this is expected behavior?

tested:

record with correct T3 length (156) and 1 child parsed correctly
record with T3 length (80) and 1 child did not parse and feedback report said: Unknown Record_Type was found.
record with T3 length (148) and 1 child parsed correctly
record with T3 length (104) and 2 children parsed correctly
record with M3 length (81) and 1 child parsed 2 records, the 2nd child record is blank. not noted in feedback report. I didnt check the remainder of scenarios for SSP, but it looks like satisfying the SSP Section 1 child records will require more work.

this will probably also be an issue with Tribal TANF section 1 child records (#2742)

jtimpe · 2023-11-24T22:05:51Z

@ADPennington thank you! those were some interesting edge cases, so i appreciate the thorough testing. i believe all of the odd behavior you noticed was due to getting partially out-of-bounds substrings from the notEmpty validator - in this pr i introduced checking fully out-of-bounds substrings (e.g., looking for substring at position 5-10 on a string of length 5) but had missed the partial case (e.g., looking for substring at position 5-10 on string of length 8). The "unknown record type" was a result of allowing blanks in the fields triggering an exception for tanf t3, the ssp case you noted must have avoided the exception and created the blank record instead.

i pushed up a small change to check the length of the resulting substrings whenever we pull them, if it's less than the length we requested, it gets treated as blank. This may impact cases where, for instance, half the data is populated for the second record and the rest is cut off - no second record will be created.

ADPennington

@jtimpe I tested the behavior of TANF and SSP section 1 file parsing/validation with respect to the length of the record and trailing space-filling and zero-filling. Results are below.

TANF

if the T1 record is the expected length and 0-filled or space-filled, it will be parsed. otherwise, it wont and will result in errors like the following ✔️ :
```
 # for T1
 Value length 155 does not match 156.
 Value length 157 does not match 156.
```
if the T2 record is the expected length, it will be parsed. otherwise, it wont and will result in errors like the following ✔️ :
```
 # for T2
 Value length 155 does not match 156.
 Value length 157 does not match 156.
```
if there is only one child in the T3 record, and the record is the correct length (156) and space-filled, then the record is parsed correctly ✔️
if there is only one child in the T3 record, and the record is the correct length (156) but 0-filled, then 2 child records get parsed, where the 2nd is zero-filled. the error report will flag all of the out of range values due to 0-filling. ✔️
```
# example T3 record with 1 child and 0-filling
T320231011111111112120180127WTTTT80W0      222   98 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
```
if there is only one child in the T3 record, and is not 0-filled or space-filled, the record will parse correctly as long as the 1st child record length == 60 ✔️
```
# example T3 record with 1 child with length == 60
T320231011111111112120180127WTTTT80W0      222   98 00000000
```
if there is only one child in the T3 record and record length < 60, not parsed, included in error report ✔️

# example of T3 < 60 with error message
T320231011111111112120180127WTTTT80W0      222   98 000000 contains blanks between positions 19 and 60.

if there is only one child in the T3 record and 60 < length < 156 with space-fill, record parsed correctly. ✔️
if there is only one child in the T3 record and record length < 60 or 60 < length < 156 with zero-fill, the 2nd child record is parsed with zeros. error report captures out of range values as expected. ✔️
if there are 2 children in the T3 record and record length == 156 with space-fill or zero-fill, records parsed correctly. ✔️
if there are 2 children in the T3 record and record length < 156 and 2nd child record truncated, only first record parsed, and no error message for 2nd child in record. ⚠️ Generate preparser errors when multi-record rows are the wrong length or are missing space-filled second records #2757

# example of T3 record with 2 children, and 2nd child record truncated
T320231011111111151220170525WTTTT@B#Y12222112204399400000000220151113WTTTT9TT#12222122204399

SSP

results consistent with TANF.

jtimpe added 3 commits November 14, 2023 10:55

add validator tests

809d1c2

add notEmpty test

a9e44e9

add blank string to value_is_empty values

017b5e3

jtimpe self-assigned this Nov 14, 2023

replace removed text

cc00bce

jtimpe mentioned this pull request Nov 14, 2023

Fix parser/preparser validation of empty strings #2747

Closed

28 tasks

jtimpe added the raft review This issue is ready for raft review label Nov 14, 2023

jtimpe requested review from George-Hudson, andrew-jameson and raftmsohani November 14, 2023 16:43

andrew-jameson added the Deploy with CircleCI-sandbox label Nov 14, 2023

andrew-jameson approved these changes Nov 15, 2023

View reviewed changes

raftmsohani approved these changes Nov 15, 2023

View reviewed changes

andrew-jameson removed raft review This issue is ready for raft review Deploy with CircleCI-sandbox labels Nov 17, 2023

andrew-jameson requested review from ADPennington and removed request for George-Hudson November 17, 2023 17:17

jtimpe added the QASP Review label Nov 17, 2023

Merge branch 'develop' into fix/empty-string-validation

f6c2b07

ADPennington added the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Nov 24, 2023

ADPennington reviewed Nov 24, 2023

View reviewed changes

fix partial oob string indexing

21e2025

elipe17 mentioned this pull request Nov 27, 2023

Tribal Section 1 Parsing & Validation #2742

Merged

32 tasks

jtimpe mentioned this pull request Nov 28, 2023

Generate preparser errors when multi-record rows are the wrong length or are missing space-filled second records #2757

Closed

8 tasks

ADPennington added 2 commits December 1, 2023 10:48

Merge branch 'develop' into fix/empty-string-validation

41ae3f0

Merge branch 'develop' into fix/empty-string-validation

9c98abd

ADPennington removed the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Dec 4, 2023

ADPennington added the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Dec 4, 2023

ADPennington approved these changes Dec 5, 2023

View reviewed changes

ADPennington added Ready to Merge and removed QASP Review Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Dec 5, 2023

andrew-jameson merged commit 11aef55 into develop Dec 6, 2023
28 checks passed

andrew-jameson deleted the fix/empty-string-validation branch December 6, 2023 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parser/preparser validation of empty strings #2748

Fix parser/preparser validation of empty strings #2748

jtimpe commented Nov 14, 2023 •

edited by ADPennington

Loading

codecov bot commented Nov 14, 2023 •

edited

Loading

ADPennington commented Nov 24, 2023

ADPennington left a comment

jtimpe commented Nov 24, 2023

ADPennington left a comment •

edited

Loading

Fix parser/preparser validation of empty strings #2748

Fix parser/preparser validation of empty strings #2748

Conversation

jtimpe commented Nov 14, 2023 • edited by ADPennington Loading

Summary of Changes

How to Test

Deliverables

codecov bot commented Nov 14, 2023 • edited Loading

Codecov Report

ADPennington commented Nov 24, 2023

ADPennington left a comment

Choose a reason for hiding this comment

jtimpe commented Nov 24, 2023

ADPennington left a comment • edited Loading

Choose a reason for hiding this comment

TANF

SSP

jtimpe commented Nov 14, 2023 •

edited by ADPennington

Loading

codecov bot commented Nov 14, 2023 •

edited

Loading

ADPennington left a comment •

edited

Loading