Skip to content

Commit

Permalink
- initial commit with templates
Browse files Browse the repository at this point in the history
- halfway completed tm for new multi select filter updates
  • Loading branch information
elipe17 committed Aug 26, 2024
1 parent fd8500d commit e414a60
Show file tree
Hide file tree
Showing 4 changed files with 162 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Multi-Select Filters

**Audience**: TDP Software Engineers <br>
**Subject**: Multi-Select Filter Integration <br>
**Date**: August 8, 2024 <br>

## Summary
This is a template to use to create new technical memorandums.

## Background
TDP has been expanding it's Django Admin Console (DAC) filtering capabilities by introducing custom filters, specifically multi-select filters. This has introduced a myriad of issues because TDP does not use the default DAC. Instead, to assist with accessability compliance TDP wraps the default DAC with [Django 508](https://github.com/raft-tech/django-admin-508) (henceforth referred to as 508) which makes various updates to the styling and functionality of the default DAC. A key change is that 508 introduces to the DAC is an `Apply Filters` button that intercepts query string parameters from default DAC filters and only applies them after clicking the button. The default DAC applies the filters as they are selected as opposed to all at once. The issue with 508's approach is that it assumes all filters are builtin Django filters (i.e. single select filters). This presents a discrepancy because Django allows developers to write custom templates and filters to add further filtering functionality (e.g. multi-select filters).

## Out of Scope
Call out what is out of scope for this technical memorandum and should be considered in a different technical memorandum.

## Method/Design
This section should contain sub sections that provide general implementation details surrounding key components required to implement the feature.

### Sub header (piece of the design, can be many of these)
sub header content describing component.

## Affected Systems
provide a list of systems this feature will depend on/change.

## Use and Test cases to consider
provide a list of use cases and test cases to be considered when the feature is being implemented.
48 changes: 48 additions & 0 deletions docs/Technical-Documentation/tech-memos/reparse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Reparsing

**Audience**: TDP Software Engineers <br>
**Subject**: Reparsing <br>
**Date**: August 9, 2024 <br>

## Summary
Re-parsing improves the flexibility of TDP's workflow for ingesting data files. These enhancement requests came out of pragmatic needs by the administrator user of the tool from 3041 and theoretical concerns from the development team in addressing current system limitations with this new feature.

## Background
https://github.com/raft-tech/TANF-app/issues/2870
https://github.com/raft-tech/TANF-app/issues/2820
https://github.com/raft-tech/TANF-app/releases/tag/v3.2.0-Sprint-90
https://github.com/raft-tech/TANF-app/pull/2772
https://github.com/raft-tech/TANF-app/issues/1858
https://github.com/raft-tech/TANF-app/issues/1350

[Driving force of reparsing](https://github.com/raft-tech/TANF-app/issues/2870)
- Reparsing files that are stuck in pending to some other state because validators have changed, or the parser has better exception handling

## Out of Scope
- Parsing and/or validator logic changes
- Data Model or search_indices changes
- Systemic/Infrastructure changes to accommodate large data sets
- End-user facing changes to our frontend
- Pipeline or Orchestration changes

## Method/Design
The reparsing enhancements focus on a maturization of the clean_and_reparse.py django commando which needed CLI invocation by system administrator(s). To mature and polish this feature to meet our new deliverables, we plan to shift major functionality and visibility into the Administrator Console to leverage our existing tools within.

#3004 introduced an initial pass at reparsing. From this key components were identified that would improve both reparsing and it's usability for the system administrators. The following items were identified to enhance the reparsing feature: introduce a Django model that tracks meta data surrounding the reparsing event, managing data synchronization and parallel execution of reparsing events, and moving away from the current CLI interface in favor of a DAC specific way to execute reparsing.

### Meta Model
This enhancement will seek to improve our visibility into what has happened during execution of a re-parsing command. We believe creating a database model to store relevant fields about the run will improve usability. Fields will include (start time, end time, number of files processed, which files were targetted, number of records repopulated, etc.)

### Data Synchronization
...

### DAC Reparse Action
To mature and polish this feature it should no longer be executed from the CLI. The DAC provides all/most of the necessary filtering required to specify what datafiles to reparse. Adding a new `reparse` action to the `DataFiles` page in the DAC provides a seamless experience for the admins while also providing the reparse event with the appropriate datafiles.
#### Confirmation dialog asking "are you sure you want to reparse?"

## Affected Systems
- Elastic
- Postgres (records, dfs, datafiles, parser errors)

## Use and Test cases to consider
provide a list of use cases and test cases to be considered when the feature is being implemented.
62 changes: 62 additions & 0 deletions docs/Technical-Documentation/tech-memos/sequential-reparse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Guarantee Sequential Reparse Events

**Audience**: TDP Software Engineers <br>
**Subject**: Sequential Reparsing <br>
**Date**: August 8, 2024 <br>


## Summary
This technical memorandum aims to provide a software engineer with initial research, design patterns, and ideas necessary
to implement sequential reparsing in the TDP application. This document covers distributed/parallel data safety, how
the data synchronization allows sequential execution guarantees, and a last ditch timeout calculation necessary to
guarantee sequential reparse events at the application level. This memorandum does not take into account network partition tolerance or parsing idempotence.

## Background
When a reparse event is executed by an admin user a set of size N files can be selected where N is on the range
[0, # of datafiles in DB]. For each reparsing event, a ReparseMeta Django model is created to track meta data about the
event such as: the number of files to be reparsed, the number of records deleted before reparsing, the number of records
created during reparsing, a backup location, etc... The meta model also contains the fields: `files_completed`, and
`files_failed`. These two fields were added to the model for it to be able to track when all files in it's set of files
had finished the parsing process, regardless of whether they passed or failed parsing.

## Distributed/Parallel Data Safety
In the [Background](#background) section the meta model and some of it's fields were introduced along with the idea that
a reparse event generates N parsing tasks. Because (theoretically) all the tasks can execute in parallel, and there is
only one meta model per event, the meta model inherently becomes a shared object and therefore must be synchronized
across the set of N parsing tasks. There are many ways to synchronize data in a distributed system, both custom and not.
However, because the meta model is a database object, this technical memorandum suggests using the already tested and
vetted concurrency control and synchronization mechanisms inherent to TDPs Postgres database. That is for the fields in
the meta model that need to be updated in parallel (`files_completed`, `files_failed`, `num_records_created`), the
implementing engineer should ensure to leverage Django queries that convert to minimumly scoped locking database
transactions. This memorandum suggests leveraging the [select_for_update()](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#select-for-update) query provides row based locking for transactions in a Postgres environment. Using this
query ensures that whichever task executes it first will be the only task that can update the fields. All other tasks trying to query the model for updates will be blocked until the original task releases the lock. Thus, each parser task can query the appropriate meta model, update the appropriate fields, and continue on as normal. The one caveat to this approach is that whenever an update needs to be made, the task must explicitely re-query the meta model to avoid any race conditions and stale
data. An piece of example code is given below to demostrate how the implementer might update the `files_completed` field. Note the function was implemented as a static member of the ReparseMeta class.

```python
@staticmethod
def increment_files_completed(reparse_meta_models):
"""
Increment the count of files that have completed parsing for the datafile's current/latest reparse model.
Because this function can be called in parallel we use `select_for_update` because multiple parse tasks can
referrence the same ReparseMeta object that is being queried below. `select_for_update` provides a DB lock on
the object and forces other transactions on the object to wait until this one completes.
"""
if reparse_meta_models.exists():
with transaction.atomic():
try:
meta_model = reparse_meta_models.select_for_update().latest("pk")
meta_model.files_completed += 1
if ReparseMeta.assert_all_files_done(meta_model):
ReparseMeta.set_reparse_finished(meta_model)
meta_model.save()
except DatabaseError:
logger.exception("Encountered exception while trying to update the `files_reparsed` field on the "
f"ReparseMeta object with ID: {meta_model.pk}.")
```

## Sequential Execution
...

## Last Ditch Timeout
...
26 changes: 26 additions & 0 deletions docs/Technical-Documentation/tech-memos/tm-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# TITLE

**Audience**: TDP Software Engineers <br>
**Subject**: SUBJECT/TITLE <br>
**Date**: August 8, 2024 <br>

## Summary
This is a template to use to create new technical memorandums.

## Background (Optional)
Background for the feature if necessary.

## Out of Scope
Call out what is out of scope for this technical memorandum and should be considered in a different technical memorandum.

## Method/Design
This section should contain sub sections that provide general implementation details surrounding key components required to implement the feature.

### Sub header (piece of the design, can be many of these)
sub header content describing component.

## Affected Systems
provide a list of systems this feature will depend on/change.

## Use and Test cases to consider
provide a list of use cases and test cases to be considered when the feature is being implemented.

0 comments on commit e414a60

Please sign in to comment.