-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Certifying Auditee field names in historical data #3402
Comments
My initial reaction is that option 1 seems preferable. Option 2 would be overwritten if we ever had to re-run dissemination for some other reason. Another possible approach would be to handle both cases during intake-to-dissemination. We could check for both cases and pass the appropriate one on to the disseminated record. |
#1 is very expensive in time. Re-running a year of data on a single-core cloud.gov instance takes approximately two weeks. A third option:
We could, in this way, begin doing data curation via API, eliminating some of the challenges of trying to do all of this as GH Actions. I've just suggested an entirely new idea that needs discussion, I think. I'm wrestling with/thinking this way because:
Probably a few other things. I have no idea how this would play with the existing migration tooling... it might be a non-starter. But, for ongoing curation work, this might be worth discussing? |
We'll tackle this as part of the next batch of |
As I began reviewing the code and scaffolding the necessary logic to fix the auditee name and title (see ticket #3402), I realized that data curation might be needed to address issues with historical records migrated from Census data. This could be due to various reasons, including bugs in the migration algorithm that were not identified at the time and are now surfacing (or may surface in the future). Additionally, there might be a need to update records in the FAC databases, regardless of their origin. This typically occurs when the FAC team modifies intake validation rules, resulting in existing records that no longer validate against the new rules without updates. When data curation involves historical records, fixing these issues will often require accessing raw data from the historical Census records table and reusing logic from the census_historical_migration app. The reason for reusing this logic is to maintain consistency, such as the way we handled missing values during migration by replacing them with the GSA_MIGRATION placeholder. This situation raises questions about how and where to organize the data curation code. Should we create a new app (data-curation) within the Django project and consolidate all data curation work there? This approach has the advantage of centralizing all data curation efforts in one place but may lead to the new app becoming too dependent on others, such as the census_historical_migration or audit apps. Alternatively, should we include a curation section within each app (one for the census_historical_migration app and one for the audit app)? This approach would make the apps more self-contained and loosely coupled, reducing dependencies between them. However, it also means the curation logic would be spread across multiple apps. |
Thinking out loud... The historical_migration code assumes:
It would be heavyweight, but could all curation be implemented as
Such that the migration code is, for all intents and purposes, the only place we do this work? (This is a third option. I haven't thought about how odd or heavyweight it might turn out to be.) My intuition/assumptions so far have been that having a For this particular issue, I've been assuming a management command would have access to both the
That is, I've been assuming we have 1) the current record and 2) the historical record in hand for all curation work, and therefore each action looks more like a management command that is probably only run once? |
While preparing the data migration documentation, it was noted that
AUDITEENAME
was incorrectly used instead ofAUDITEECERTIFYNAME
, andAUDITEETITLE
was used instead ofAUDITEECERTIFYTITLE
. It was determined that this will have a low impact on the disseminated reports as it does not affect the financial aspect of the audit reports and only affects the auditee certifying information. However, because this still introduces some data inaccuracies in the reports in production, it must be addressed.Possible solutions:
The text was updated successfully, but these errors were encountered: