Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2795 s1 duplicates #2956

Merged
merged 143 commits into from
Jun 27, 2024
Merged

2795 s1 duplicates #2956

merged 143 commits into from
Jun 27, 2024

Conversation

elipe17
Copy link

@elipe17 elipe17 commented Apr 22, 2024

Summary of Changes

CPU and Memory Analysis

  • Used Docker to track the web container stats during parsing of the largest file we have in the repo (~50MB) on the develop branch and this branch.
  • The overall results can be found in super_big_file_results.txt. The data listed in that text file can be found in the accompanying files: develop_super_big_file.txt and dup_super_big_file.txt.
  • The data was generated with: track_docker_stats.sh.txt (GitHub won't let you upload a pure shell file).
    super_big_file_results.txt
    dup_super_big_file.txt
    develop_super_big_file.txt
    track_docker_stats.sh.txt

How to Test

List the steps to test the PR
These steps are generic, please adjust as necessary.

cd tdrs-frontend && docker-compose up
cd tdrs-backend && docker-compose up
  1. Open http://localhost:3000/ and sign in.
  2. Create a file for TANF section 1 - 4 that has one or more duplicate or partial duplicate records.
  3. Submit file
  4. Verify no records tied to that case are created and the error report reflects cat4 duplicate errors

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • In section 1 files, exact and partial duplicate records are detected, not stored in db, and error message included in the feedback report.
  • In section 2 files, exact and partial duplicate records are detected, not stored in db, and error message included in the feedback report.
  • section 3 files, partial duplicate T6 records are detected, not stored in db, and error message included in the feedback report.
  • section 4 files, partial duplicate T7 records are detected, not stored in db, and error message included in the feedback report.
  • duplicate detection logic is applied consistently to TANF, SSP, and Tribal TANF files.
  • cat 4 records not stored in db
  • Testing Checklist has been run and all tests pass
  • README is updated, if necessary

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

@elipe17 elipe17 self-assigned this Apr 22, 2024
elipe17 added 11 commits April 22, 2024 09:12
- Starting to remove SortedRecordSchemaPairs from case consistency validator
- updating to support in memory record removal if they havent been serialized
- Update case consistency validator to not use OG SortedRecordSchemaPairs
- Update dup logic to not consider records on the same line
- Add method to generate bulk create dictionary
- Stub function for removing cases
- basing off of doc instead of schema
@elipe17 elipe17 added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Jun 21, 2024
@ADPennington ADPennington added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Jun 25, 2024
# loop through all t2s and t3s
# to find record where FAMILY_AFFILIATION == 1
num_errors += self.__validate_family_affiliation(num_errors, t1s, t2s, t3s, (
f'Every {t1_model_name} record should have at least one corresponding '
Copy link
Collaborator

@ADPennington ADPennington Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicative error messages produced by the following file

SSP_Sec1.txt

Every M1 record should have at least one corresponding M2 or M3 record with the same RPT_MONTH_YEAR and CASE_NUMBER.
Every M1 record should have at least one corresponding M2 or M3 record with the same RPT_MONTH_YEAR and CASE_NUMBER.

Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elipe17 this is in great shape. couple of suggestions on cat 4 error messages that either need clarity or are redundant and could be dropped.

Testing Summary:

  • expected logic is in-place for Section 1 🚀
  • expected logic is in-place for Section 2 🚀
  • expected logic is in-place for Section 3 🚀
  • expected logic is in-place for Section 4 🚀
  • cat 4 ⚠️
  • consistency across tribal, tanf, and ssp 🚀

@ADPennington ADPennington removed the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Jun 26, 2024
@ADPennington ADPennington added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Jun 26, 2024
Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per our async discussion, i think its worth revisiting the discussion about how many times the cat 4-related error messages appear in the error report, especially in scenarios where these are related to duplicates. this looks great! thanks @elipe17

@elipe17 elipe17 merged commit 35adb87 into develop Jun 27, 2024
14 checks passed
@elipe17 elipe17 deleted the 2795-s1-duplicates branch June 27, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend dev Parity Work associated with TDP Parity Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As tech lead, I need TDP to detect duplicate records within a file and not store them in the db.
6 participants