Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parser/preparser validation of empty strings #2748

Merged
merged 8 commits into from
Dec 6, 2023

Conversation

jtimpe
Copy link

@jtimpe jtimpe commented Nov 14, 2023

Summary of Changes

Due to the way python slices strings given an out-of-bounds range, the parser was creating blank-string-filled records when a multi-record was not space-filled. Once section 1 validation was introduced in #2518, uncaught exceptions are thrown during parsing

E.g.
T320210400028221R0112014122888175617622222112204398100000000
Creates two records, a properly filled t3 record, and a t3 consisting of '' values for all char fields (which subsequently fails validation and ends the parser task for that submission)

whereas
T320210400028221R0112014122888175617622222112204398100000000
Creates a single, properly filled t3 record with no errors.

I also moved around some functions and classes to resolve circular imports.

How to Test

List the steps to test the PR
These steps are generic, please adjust as necessary.

cd tdrs-backend && docker-compose up
cd tdrs-frontend && docker-compose up --build
  1. Open http://localhost:3000/ and sign in.
  2. Upload a file with the row included above, with and without the trailing space-fill
  3. Ensure the resulting record, status, and error report are the same for both submissions.

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • [insert ACs here]
  • lfrohlich and/or adpennington confirmed that ACs are met.

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

@jtimpe jtimpe self-assigned this Nov 14, 2023
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #2748 (9c98abd) into develop (a832605) will increase coverage by 0.01%.
Report is 3 commits behind head on develop.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2748      +/-   ##
===========================================
+ Coverage    92.79%   92.80%   +0.01%     
===========================================
  Files          202      246      +44     
  Lines         4591     5576     +985     
  Branches       320      480     +160     
===========================================
+ Hits          4260     5175     +915     
- Misses         271      308      +37     
- Partials        60       93      +33     
Flag Coverage Δ
dev-backend 92.80% <100.00%> (+0.01%) ⬆️
dev-frontend 92.83% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
tdrs-backend/tdpservice/parsers/fields.py 86.04% <100.00%> (-0.32%) ⬇️
tdrs-backend/tdpservice/parsers/row_schema.py 92.94% <100.00%> (+0.08%) ⬆️
tdrs-backend/tdpservice/parsers/validators.py 96.64% <100.00%> (+0.19%) ⬆️

... and 44 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb16d8b...9c98abd. Read the comment docs.

@andrew-jameson andrew-jameson removed raft review This issue is ready for raft review Deploy with CircleCI-sandbox labels Nov 17, 2023
@andrew-jameson andrew-jameson requested review from ADPennington and removed request for George-Hudson November 17, 2023 17:17
@ADPennington ADPennington added the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Nov 24, 2023
@ADPennington
Copy link
Collaborator

@jtimpe getting the following migration-related traceback:

09:54:52.867: [APP/PROC/WEB.0] Traceback (most recent call last):
09:54:52.867: [APP/PROC/WEB.0] File "/home/vcap/app/manage.py", line 31, in <module>
09:54:52.867: [APP/PROC/WEB.0] main()
09:54:52.867: [APP/PROC/WEB.0] File "/home/vcap/app/manage.py", line 27, in main
09:54:52.868: [APP/PROC/WEB.0] execute_from_command_line(sys.argv)
09:54:52.868: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
09:54:52.868: [APP/PROC/WEB.0] utility.execute()
09:54:52.868: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/__init__.py", line 413, in execute
09:54:52.868: [APP/PROC/WEB.0] self.fetch_command(subcommand).run_from_argv(self.argv)
09:54:52.868: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/base.py", line 354, in run_from_argv
09:54:52.869: [APP/PROC/WEB.0] self.execute(*args, **cmd_options)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/base.py", line 398, in execute
09:54:52.869: [APP/PROC/WEB.0] output = self.handle(*args, **options)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/base.py", line 89, in wrapped
09:54:52.869: [APP/PROC/WEB.0] res = handle_func(*args, **kwargs)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 95, in handle
09:54:52.869: [APP/PROC/WEB.0] executor.loader.check_consistent_history(connection)
09:54:52.869: [APP/PROC/WEB.0] File "/home/vcap/deps/1/python/lib/python3.10/site-packages/django/db/migrations/loader.py", line 306, in check_consistent_history
09:54:52.869: [APP/PROC/WEB.0] raise InconsistentMigrationHistory(
09:54:52.869: [APP/PROC/WEB.0] django.db.migrations.exceptions.InconsistentMigrationHistory: Migration search_indexes.0021_ssp_m7 is applied before its dependency search_indexes.0020_ssp_m4_ssp_m5 on database 'default'.

Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtimpe notes below from testing. can you let me know if this is expected behavior?

tested:

  • record with correct T3 length (156) and 1 child parsed correctly
  • record with T3 length (80) and 1 child did not parse and feedback report said: Unknown Record_Type was found.
  • record with T3 length (148) and 1 child parsed correctly
  • record with T3 length (104) and 2 children parsed correctly
  • record with M3 length (81) and 1 child parsed 2 records, the 2nd child record is blank. not noted in feedback report. I didnt check the remainder of scenarios for SSP, but it looks like satisfying the SSP Section 1 child records will require more work.

this will probably also be an issue with Tribal TANF section 1 child records (#2742)

@jtimpe
Copy link
Author

jtimpe commented Nov 24, 2023

@ADPennington thank you! those were some interesting edge cases, so i appreciate the thorough testing. i believe all of the odd behavior you noticed was due to getting partially out-of-bounds substrings from the notEmpty validator - in this pr i introduced checking fully out-of-bounds substrings (e.g., looking for substring at position 5-10 on a string of length 5) but had missed the partial case (e.g., looking for substring at position 5-10 on string of length 8). The "unknown record type" was a result of allowing blanks in the fields triggering an exception for tanf t3, the ssp case you noted must have avoided the exception and created the blank record instead.

i pushed up a small change to check the length of the resulting substrings whenever we pull them, if it's less than the length we requested, it gets treated as blank. This may impact cases where, for instance, half the data is populated for the second record and the rest is cut off - no second record will be created.

@ADPennington ADPennington removed the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Dec 4, 2023
@ADPennington ADPennington added the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Dec 4, 2023
Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtimpe I tested the behavior of TANF and SSP section 1 file parsing/validation with respect to the length of the record and trailing space-filling and zero-filling. Results are below.



TANF

  • if the T1 record is the expected length and 0-filled or space-filled, it will be parsed. otherwise, it wont and will result in errors like the following ✔️ :

     # for T1
     Value length 155 does not match 156.
     Value length 157 does not match 156.
  • if the T2 record is the expected length, it will be parsed. otherwise, it wont and will result in errors like the following ✔️ :

     # for T2
     Value length 155 does not match 156.
     Value length 157 does not match 156.
  • if there is only one child in the T3 record, and the record is the correct length (156) and space-filled, then the record is parsed correctly ✔️

  • if there is only one child in the T3 record, and the record is the correct length (156) but 0-filled, then 2 child records get parsed, where the 2nd is zero-filled. the error report will flag all of the out of range values due to 0-filling. ✔️

    # example T3 record with 1 child and 0-filling
    T320231011111111112120180127WTTTT80W0      222   98 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
  • if there is only one child in the T3 record, and is not 0-filled or space-filled, the record will parse correctly as long as the 1st child record length == 60 ✔️

    # example T3 record with 1 child with length == 60
    T320231011111111112120180127WTTTT80W0      222   98 00000000
  • if there is only one child in the T3 record and record length < 60, not parsed, included in error report ✔️

# example of T3 < 60 with error message
T320231011111111112120180127WTTTT80W0      222   98 000000 contains blanks between positions 19 and 60. 
  • if there is only one child in the T3 record and 60 < length < 156 with space-fill, record parsed correctly. ✔️

  • if there is only one child in the T3 record and record length < 60 or 60 < length < 156 with zero-fill, the 2nd child record is parsed with zeros. error report captures out of range values as expected. ✔️

  • if there are 2 children in the T3 record and record length == 156 with space-fill or zero-fill, records parsed correctly. ✔️

  • if there are 2 children in the T3 record and record length < 156 and 2nd child record truncated, only first record parsed, and no error message for 2nd child in record. ⚠️ Generate preparser errors when multi-record rows are the wrong length or are missing space-filled second records #2757

# example of T3 record with 2 children, and 2nd child record truncated
T320231011111111151220170525WTTTT@B#Y12222112204399400000000220151113WTTTT9TT#12222122204399

SSP

  • results consistent with TANF.

@ADPennington ADPennington added Ready to Merge and removed QASP Review Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Dec 5, 2023
@andrew-jameson andrew-jameson merged commit 11aef55 into develop Dec 6, 2023
28 checks passed
@andrew-jameson andrew-jameson deleted the fix/empty-string-validation branch December 6, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants