Merge branch '2693-cat2-messaging-cleanup' into fix/cat3-error-cleanup

raft-tech · Jun 27, 2024 · 94e93f5 · 94e93f5
2 parents 3b0a1b8 + b95fb9f
commit 94e93f5
Show file tree

Hide file tree

Showing 55 changed files with 2,877 additions and 37,914 deletions.
diff --git a/docs/How-We-Work/Team-Composition.md b/docs/How-We-Work/Team-Composition.md
@@ -8,14 +8,13 @@ Please refer to the [Team Members doc](https://hhsgov.sharepoint.com/:w:/r/sites
 * Alexandra Pennington, OFA, tech lead
 
 **Raft**
-* Connor Smith, Raft, facilitator/product manager
-* Miles Reiter, Raft, design lead + senior ux/ui researcher and designer
-* Diana Liang, Raft, ux/ui researcher and designer
+* Rob Gendron, Raft, facilitator/product manager
+* Victoria Amoroso, Raft, design lead + senior ux/ui researcher and designer
+* Miles Reiter, Raft, senior ux/ui researcher and designer
 * Andrew Jameson, Raft, tech lead
-* Cameron Smart, Raft, full stack engineer
 * Jan Timpe, Raft, full stack engineer
 * Mo Sohani, Raft, full stack engineer
-* George Hudson, Raft, devops engineer
+* Eric Lipe, Raft, full stack engineer
 
 ## Subject Matter Experts
 **OFA Data Team**

diff --git a/docs/Security-Compliance/File-Transfer-TDRS/README.md b/docs/Security-Compliance/File-Transfer-TDRS/README.md
diff --git a/docs/Security-Compliance/File-Transfer-TDRS/diagram.drawio b/docs/Security-Compliance/File-Transfer-TDRS/diagram.drawio
diff --git a/docs/Security-Compliance/File-Transfer-TDRS/diagram.png b/docs/Security-Compliance/File-Transfer-TDRS/diagram.png
diff --git a/docs/Technical-Documentation/diagrams/parsing.drawio b/docs/Technical-Documentation/diagrams/parsing.drawio
diff --git a/docs/Technical-Documentation/diagrams/parsing.png b/docs/Technical-Documentation/diagrams/parsing.png
diff --git a/docs/Technical-Documentation/parsing-flow.md b/docs/Technical-Documentation/parsing-flow.md
@@ -0,0 +1,11 @@
+# High Level Parsing Flow
+
+Parsing begins after a user submits a datafile or datafiles via the frontend. The submission generates a new Celery task
+or tasks which are enqueued to Redis. As work becomes available the Celery workers dequeue a task from Redis and begin 
+working them. The parsing task gets the Datafile Django model and begins iterating over each line in the file. For each 
+line in the file this task: parses the line into a new record, performs category 1 - 3 validation on the record, 
+performs exact duplicate and partial duplicate detection, performs category 4 validation, and stores the record in a 
+cache to be bulk created/serialized to the database and ElasticSearch. The image below provides a high level flow of the
+aforementioned steps.
+
+![Parsing Flow](./diagrams/parsing.png)
diff --git a/docs/Technical-Documentation/secret-key-rotation-steps.md b/docs/Technical-Documentation/secret-key-rotation-steps.md
@@ -6,7 +6,6 @@ To maintain good security, we will periodically rotate the following secret keys
 - CF deployer keys (_for continuous delivery_)
 - JWT keys (_external user auth_)
 - ACF AMS keys (_internal user auth_)
-- ACF Titan server keys (_for file transfers between TDP and TDRS_)
 - Django secret keys ([_cryptographic signing_](https://docs.djangoproject.com/en/4.0/topics/signing/#module-django.core.signing))
 
 This document outlines the process for doing this for each set of keys. 
@@ -154,61 +153,6 @@ Service requests tickets must be submitted by Government-authorized personnel wi
 2. Update environment variables in CircleCI and relevant cloud.gov backend applications after ticket completed by OCIO. [Restage applications](https://cloud.gov/docs/deployment/app-maintenance/#restaging-your-app).
 </details>
 
-**<details><summary>ACF Titan Server Keys</summary>**
-The ACF OCIO Ops team manages these credentials for all environments (dev, staging, and prod), so we will need to submit a service request ticket whenever we need keys rotated. 
-
-Service requests tickets must be submitted by Government-authorized personnel with Government computers and PIV access (e.g. Raft tech lead for lower environments and TDP sys admins for production environment). Please follow the procedures below:
-
-1. Generate new public/private key pair
-
-Below is an example of how to generate new titan public/private key pair from _Git BASH for Windows_. Two files called `filename_where_newtitan_keypair_saved` are created: one is the _private_ key and the other is a _public_ key (the latter is saved with a _.pub_ extention).
-(note: the info below is not associated with any real keys)
-
-```
-$ ssh-keygen -t rsa -b 4096
-Generating public/private rsa key pair.
-
-Enter file in which to save the key (/c/Users/username/.ssh/id_rsa): filename_where_newtitan_keypair_saved
-
-Enter passphrase (empty for no passphrase):
-
-Enter same passphrase again:
-
-Your identification has been saved in filename_where_newtitan_keypair_saved
-
-Your public key has been saved in filename_where_newtitan_keypair_saved.pub
-
-The key fingerprint is:
-SHA256:BY6Nl0hCjIrI9yZMBGH2vbDFLCTq2DsFQXQTmLydwjI 
-
-The key's randomart image is:
-+---[RSA 4096]----+
-| X*B*.. .        |
-|+ O+=+ * o       |
-|=oo* *+ = .      |
-|Eo++B .. .       |
-|.+=oo.  S        |
-|   = o           |
-|  o o            |
-|   .             |
-|                 |
-+----[SHA256]-----+
-```
-
-2. Submit request tickets from government-issued email address and use the email template located on **page 2** of [this document.](https://hhsgov.sharepoint.com/:w:/r/sites/TANFDataPortalOFA/Shared%20Documents/compliance/Authentication%20%26%20Authorization/ACF%20AMS%20docs/OCIO%20OPERATIONS%20REQUEST%20TEMPLATES.docx?d=w5332585c1ecf49a4aeda17674f687154&csf=1&web=1&e=aQyIPz) cc OFA tech lead on lower environment requests. 
-
-The request should include:
-- the titan service account name (i.e. `tanfdp` for prod; `tanfdpdev` for dev/staging) 
-- the newly generated public key from `filename_where_newtitan_keypair_saved.pub`
-
-3. When OCIO confirms that the change has been made, add the private key from `filename_where_newtitan_keypair_saved` to CircleCI as an environment variable. The variable name is `ACFTITAN_KEY`. **Please note**: the value needs must be edited before adding to CircleCI. It should be a one-line string with underscores ("_") replacing the spaces at the end of every line. See example below:
-
-```
------BEGIN OPENSSH PRIVATE KEY-----_somehashvalue_-----END OPENSSH PRIVATE KEY-----
-```
-
-4. Re-run the deployment workflow from CircleCI and confirm that the updated key value pair has been added to the relevant cloud.gov backend application.
-</details>
 
 **<details><summary>Django secret keys</summary>**
 

diff --git a/scripts/deploy-backend.sh b/scripts/deploy-backend.sh
@@ -42,9 +42,6 @@ echo backend_app_name: "$backend_app_name"
 set_cf_envs()
 {
   var_list=(
-  "ACFTITAN_HOST"
-  "ACFTITAN_KEY"
-  "ACFTITAN_USERNAME"
   "AMS_CLIENT_ID"
   "AMS_CLIENT_SECRET"
   "AMS_CONFIGURATION_ENDPOINT"

diff --git a/tdrs-backend/.env.example b/tdrs-backend/.env.example
@@ -86,6 +86,3 @@ ELASTIC_HOST=elastic:9200
 
 # testing
 CYPRESS_TOKEN=local-cypress-token
-
-# sftp
-ACFTITAN_SFTP_PYTEST=local-acftitan-key
diff --git a/tdrs-backend/Pipfile b/tdrs-backend/Pipfile
@@ -51,8 +51,6 @@ celery = "==5.2.6"
 redis = "==4.1.2"
 flower = "==1.1.0"
 django-celery-beat = "==2.2.1"
-paramiko = "==2.11.0"
-pytest_sftpserver = "==1.3.0"
 elasticsearch = "==7.13.4" # REQUIRED - v7.14.0 introduces breaking changes
 django-elasticsearch-dsl = "==7.3"
 django-elasticsearch-dsl-drf = "==0.22.5"

diff --git a/tdrs-backend/Pipfile.lock b/tdrs-backend/Pipfile.lock
diff --git a/tdrs-backend/docker-compose.local.yml b/tdrs-backend/docker-compose.local.yml
@@ -68,12 +68,8 @@ services:
       - AMS_CLIENT_ID
       - AMS_CLIENT_SECRET
       - AMS_CONFIGURATION_ENDPOINT
-      - ACFTITAN_HOST
-      - ACFTITAN_KEY
-      - ACFTITAN_USERNAME
       - REDIS_URI=redis://redis-server:6379
       - REDIS_SERVER_LOCAL=TRUE
-      - ACFTITAN_SFTP_PYTEST
       - SENDGRID_API_KEY
     volumes:
       - .:/tdpapp

diff --git a/tdrs-backend/docker-compose.yml b/tdrs-backend/docker-compose.yml
@@ -91,12 +91,8 @@ services:
       - AMS_CLIENT_ID
       - AMS_CLIENT_SECRET
       - AMS_CONFIGURATION_ENDPOINT
-      - ACFTITAN_HOST
-      - ACFTITAN_KEY
-      - ACFTITAN_USERNAME
       - REDIS_URI=redis://redis-server:6379
       - REDIS_SERVER_LOCAL=TRUE
-      - ACFTITAN_SFTP_PYTEST
       - CYPRESS_TOKEN
       - DJANGO_DEBUG
       - SENDGRID_API_KEY

diff --git a/tdrs-backend/tdpservice/data_files/views.py b/tdrs-backend/tdpservice/data_files/views.py
@@ -18,7 +18,7 @@
 from tdpservice.data_files.util import get_xls_serialized_file
 from tdpservice.data_files.models import DataFile, get_s3_upload_path
 from tdpservice.users.permissions import DataFilePermissions, IsApprovedPermission
-from tdpservice.scheduling import sftp_task, parser_task
+from tdpservice.scheduling import parser_task
 from tdpservice.data_files.s3_client import S3Client
 from tdpservice.parsers.models import ParserError
 from tdpservice.parsers.serializers import ParsingErrorSerializer
@@ -59,7 +59,6 @@ def create(self, request, *args, **kwargs):
 
         # only if file is passed the virus scan and created successfully will we perform side-effects:
         # * Send to parsing
-        # * Upload to ACF-TITAN
         # * Send email to user
 
         logger.debug(f"{self.__class__.__name__}: status: {response.status_code}")
@@ -74,15 +73,6 @@ def create(self, request, *args, **kwargs):
             parser_task.parse.delay(data_file_id)
             logger.info("Submitted parse task to queue for datafile %s.", data_file_id)
 
-            sftp_task.upload.delay(
-                data_file_pk=data_file_id,
-                server_address=settings.ACFTITAN_SERVER_ADDRESS,
-                local_key=settings.ACFTITAN_LOCAL_KEY,
-                username=settings.ACFTITAN_USERNAME,
-                port=22
-            )
-            logger.info("Submitted upload task to redis for datafile %s.", data_file_id)
-
             app_name = settings.APP_NAME + '/'
             key = app_name + get_s3_upload_path(data_file, '')
             version_id = self.get_s3_versioning_id(response.data.get('original_filename'), key)

diff --git a/tdrs-backend/tdpservice/parsers/aggregates.py b/tdrs-backend/tdpservice/parsers/aggregates.py
@@ -1,9 +1,10 @@
 """Aggregate methods for the parsers."""
 from .row_schema import SchemaManager
-from .models import ParserError
+from .models import ParserError, ParserErrorCategoryChoices
 from .util import month_to_int, \
     transform_to_months, fiscal_to_calendar, get_prog_from_section
 from .schema_defs.utils import get_program_models, get_text_from_df
+from django.db.models import Q as Query
 
 
 def case_aggregates_by_month(df, dfs_status):
@@ -39,22 +40,25 @@ def case_aggregates_by_month(df, dfs_status):
             if isinstance(schema_model, SchemaManager):
                 schema_model = schema_model.schemas[0]
 
-            curr_case_numbers = set(schema_model.document.Django.model.objects.filter(datafile=df)
-                                    .filter(RPT_MONTH_YEAR=rpt_month_year)
+            curr_case_numbers = set(schema_model.document.Django.model.objects.filter(datafile=df,
+                                                                                      RPT_MONTH_YEAR=rpt_month_year)
                                     .distinct("CASE_NUMBER").values_list("CASE_NUMBER", flat=True))
             case_numbers = case_numbers.union(curr_case_numbers)
 
         total += len(case_numbers)
-        cases_with_errors += ParserError.objects.filter(file=df).filter(
-            case_number__in=case_numbers).distinct('case_number').count()
+        cases_with_errors += ParserError.objects.filter(file=df, case_number__in=case_numbers)\
+            .distinct('case_number').count()
         accepted = total - cases_with_errors
 
         aggregate_data['months'].append({"month": month,
                                          "accepted_without_errors": accepted,
                                          "accepted_with_errors": cases_with_errors})
 
-    aggregate_data['rejected'] = ParserError.objects.filter(file=df).filter(case_number=None).distinct("row_number")\
-        .exclude(row_number=0).count()
+    error_type_query = Query(error_type=ParserErrorCategoryChoices.PRE_CHECK) | \
+        Query(error_type=ParserErrorCategoryChoices.CASE_CONSISTENCY)
+
+    aggregate_data['rejected'] = ParserError.objects.filter(error_type_query, file=df)\
+        .distinct("row_number").exclude(row_number=0).count()
 
     return aggregate_data