Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up NOFO importing #134

Merged
merged 9 commits into from
Dec 24, 2024
Merged

Speed up NOFO importing #134

merged 9 commits into from
Dec 24, 2024

Conversation

pcraig3
Copy link
Collaborator

@pcraig3 pcraig3 commented Dec 24, 2024

As we have continued to add steps to the NOFO import step, we've ended up increasing the total time needed to import a NOFO.

After running some profiling of the entire import step, we discovered the largest contributor to the import slowdown is the add_headings_to_nofo function, which was acocunting for over 60% of the import time for a typical NOFO.

Using a CMS NOFO as an example, we were seeing this online:

  • Total import time: 21.829363 seconds
  • add_headings_to_nofo runtime: 14.137533 seconds

So if we can make this function more efficient, we can dramatically improve our NOFO import times.

This change precompiles regex patterns for each new_id before processing sections and subsections, eliminating the overhead of compiling regex repeatedly.

The precompiled patterns are reused for all replacements in subsection bodies, resulting in a significant speedup for all NOFOs.

This improvement reduces redundant computation and scales efficiently with the number of sections and subsections.

Explanation of Changes

- Precompile Regex Patterns:
    * Loop through new_ids once to compile all necessary regex patterns.
    * Each compiled regex is stored in a dictionary along with the new_id for reuse during replacements.
- Use Precompiled Patterns in the Loop:
    * Instead of recompiling the regex for each subsection, we use the precompiled patterns, which are much faster to apply.
- Benefits:
    * Reduces redundant regex compilations for every new_id and subsection.body.
    * Improves performance for NOFOs with many sections and subsections.
    * By precompiling regex patterns, you eliminate the overhead of dynamically generating and compiling the same pattern multiple times.
…ance

* Summary of Changes:
    * Replaced individual save() calls for sections and subsections with bulk_update operations.
    * Introduced two separate bulk_update steps: one for updating html_id values and another for updating bodycontent in subsections.

* Why This Change?
    * Previously, the function performed individual save() calls within loops. For NOFOs with many sections and subsections, this resulted in a significant number of database write operations.
    * By using bulk_update, multiple updates are grouped into a single query, reducing the number of database round trips and significantly improving performance.

* Impact:
    * Faster processing for NOFOs with large numbers of sections and subsections.
    * Reduces load on the database by minimizing the number of write operations.
@pcraig3 pcraig3 changed the title Fix Benchmark Speed up NOFO importing Dec 24, 2024
@pcraig3 pcraig3 force-pushed the benchmark branch 4 times, most recently from 7258ef7 to 8482e79 Compare December 24, 2024 18:39
@pcraig3 pcraig3 mentioned this pull request Dec 24, 2024
@pcraig3
Copy link
Collaborator Author

pcraig3 commented Dec 24, 2024

Updated times for are here:

  • Total import time: 8.694077 seconds (~60% reduction)
  • add_headings_to_nofo runtime: 0.953009 seconds (~94% reduction)

…ance

* Summary of Changes:
    * Replaced individual save() calls for sections and subsections with bulk_update operations.
    * Introduced two separate bulk_update steps: one for updating html_id values and another for updating bodycontent in subsections.

* Why This Change?
    * Previously, the function performed individual save() calls within loops. For NOFOs with many sections and subsections, this resulted in a significant number of database write operations.
    * By using bulk_update, multiple updates are grouped into a single query, reducing the number of database round trips and significantly improving performance.

* Impact:
    * Faster processing for NOFOs with large numbers of sections and subsections.
    * Reduces load on the database by minimizing the number of write operations.
Now that we have solved our biggest performance bottleneck, we don't
need this incredibly long timeout.
It is really handy for debugging but otherwise it is polluting
our otherwise pristine codebase.
@pcraig3 pcraig3 force-pushed the benchmark branch 2 times, most recently from a4d80ac to 427c481 Compare December 24, 2024 20:31
Since removing those import and export JSON routes, there are a bunch
of imports that we don't use anymore, so this just cleans them up.
From the changelog:

- Add cover image for CDC-RFA-IP-25-0007
- Add cover image for CMS-2V2-25-001
  - Also add inline images
- Add utility classes for different list counters

- Updates to CMS theme:
  - Fix background and text colour for cover page
  - "Standard" icons are blue
  - Tighter line-height for h5s
  - Smaller, bolder h7s
- Speed up "add_headings_to_nofo" function
  - Precompile regex patterns for heading ID substitution
  - Batch update sections and subsections

- Don't demote h7s, since there is no lower heading level
@pcraig3
Copy link
Collaborator Author

pcraig3 commented Dec 24, 2024

'Twas the night before Christmas, and all through the app,
Our imports were speeding, no time for a nap.
With batch updates flowing, and regex refined,
We're saving the hours, and leaving lag behind!

@pcraig3 pcraig3 merged commit 42e550a into main Dec 24, 2024
4 checks passed
@pcraig3 pcraig3 deleted the benchmark branch December 26, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant