Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As sys admin, I want to be able to reparse datafile sets #2978

Closed
8 tasks
andrew-jameson opened this issue May 7, 2024 · 4 comments
Closed
8 tasks

As sys admin, I want to be able to reparse datafile sets #2978

andrew-jameson opened this issue May 7, 2024 · 4 comments
Assignees
Labels
backend dev Priority Use this label for issues or PRs that need to be expedited Refined Ticket has been refined at the backlog refinement

Comments

@andrew-jameson
Copy link
Collaborator

andrew-jameson commented May 7, 2024

Description:
Implement ability to re-parse/refresh already-submitted data files and have a lifecycle for this generated data. This is a solution for #2870 and the re-indexing issues.

Acceptance Criteria:

  • Manual step documented for triggering task to re-parse all old data files
  • System Administrator can delete/re-parse a set of data files
  • Admin should be able to update by quarter for all data types.
  • Testing Checklist has been run and all tests pass
  • README is updated, if necessary

Tasks:

  • Build filter for data files submitted before certain date aka prior submission period
  • Build out manual task trigger for orchestrating lifecycle, deployments, and re-parsing
  • Run Testing Checklist and confirm all tests pass

Notes:

  • @ADPennington has mentioned that prod data files from fy22 would be ideal candidates because those are not only un-parsed (submitted before parsing was in prod), but also may not adhere to current validation logic standards/guidance.

Supporting Documentation:

Open Questions:

  • Do we want filtering/granularity for a given quarter, program type (tanf vs ssp), full year, etc.? since last reporting period? Alex gave guidance to focus on latest quarter in fy24.
  • Do we need to implement drops/archives for all postgres data as well?
@andrew-jameson andrew-jameson added backend dev Priority Use this label for issues or PRs that need to be expedited labels May 7, 2024
@robgendron robgendron added the Refined Ticket has been refined at the backlog refinement label May 14, 2024
@andrew-jameson
Copy link
Collaborator Author

andrew-jameson commented May 15, 2024

  • Build DAC filter on datafiles by quarter
  • Add DAC action for 're-parse'
  • Implement base unit tests
  • Based on list of datafile id's, iterate .delay() calls per id generating new celery tasks -- or do we want by quarter? how does admin input that? shell_plus w/ django cmd?
  • Implement a orchestration func that:
    • a) cleans up postgres
    • b) cleans up ES
    • c) resets DFS
    • d) re-calls parse_datafile() w/ param(s)
  • Have some check that all datafile ids were handled. how might we handle conflicts if data in db already exists?

@robgendron
Copy link

11 points remaining.

@robgendron
Copy link

robgendron commented Jun 25, 2024

This will be completed in #3004. Andrew's May 15 notes has become basis for #3004. Deemed closed.

@robgendron robgendron reopened this Jul 3, 2024
@ADPennington
Copy link
Collaborator

@lfrohlich we will close this ticket and capture more details in #3004 cc: @robgendron @andrew-jameson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend dev Priority Use this label for issues or PRs that need to be expedited Refined Ticket has been refined at the backlog refinement
Projects
None yet
Development

No branches or pull requests

4 participants