-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up NOFO importing #134
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change precompiles regex patterns for each new_id before processing sections and subsections, eliminating the overhead of compiling regex repeatedly. The precompiled patterns are reused for all replacements in subsection bodies, resulting in a significant speedup for all NOFOs. This improvement reduces redundant computation and scales efficiently with the number of sections and subsections. Explanation of Changes - Precompile Regex Patterns: * Loop through new_ids once to compile all necessary regex patterns. * Each compiled regex is stored in a dictionary along with the new_id for reuse during replacements. - Use Precompiled Patterns in the Loop: * Instead of recompiling the regex for each subsection, we use the precompiled patterns, which are much faster to apply. - Benefits: * Reduces redundant regex compilations for every new_id and subsection.body. * Improves performance for NOFOs with many sections and subsections. * By precompiling regex patterns, you eliminate the overhead of dynamically generating and compiling the same pattern multiple times.
…ance * Summary of Changes: * Replaced individual save() calls for sections and subsections with bulk_update operations. * Introduced two separate bulk_update steps: one for updating html_id values and another for updating bodycontent in subsections. * Why This Change? * Previously, the function performed individual save() calls within loops. For NOFOs with many sections and subsections, this resulted in a significant number of database write operations. * By using bulk_update, multiple updates are grouped into a single query, reducing the number of database round trips and significantly improving performance. * Impact: * Faster processing for NOFOs with large numbers of sections and subsections. * Reduces load on the database by minimizing the number of write operations.
pcraig3
force-pushed
the
benchmark
branch
4 times, most recently
from
December 24, 2024 18:39
7258ef7
to
8482e79
Compare
Closed
Updated times for are here:
|
…ance * Summary of Changes: * Replaced individual save() calls for sections and subsections with bulk_update operations. * Introduced two separate bulk_update steps: one for updating html_id values and another for updating bodycontent in subsections. * Why This Change? * Previously, the function performed individual save() calls within loops. For NOFOs with many sections and subsections, this resulted in a significant number of database write operations. * By using bulk_update, multiple updates are grouped into a single query, reducing the number of database round trips and significantly improving performance. * Impact: * Faster processing for NOFOs with large numbers of sections and subsections. * Reduces load on the database by minimizing the number of write operations.
Now that we have solved our biggest performance bottleneck, we don't need this incredibly long timeout.
It is really handy for debugging but otherwise it is polluting our otherwise pristine codebase.
pcraig3
force-pushed
the
benchmark
branch
2 times, most recently
from
December 24, 2024 20:31
a4d80ac
to
427c481
Compare
Since removing those import and export JSON routes, there are a bunch of imports that we don't use anymore, so this just cleans them up.
From the changelog: - Add cover image for CDC-RFA-IP-25-0007 - Add cover image for CMS-2V2-25-001 - Also add inline images - Add utility classes for different list counters - Updates to CMS theme: - Fix background and text colour for cover page - "Standard" icons are blue - Tighter line-height for h5s - Smaller, bolder h7s - Speed up "add_headings_to_nofo" function - Precompile regex patterns for heading ID substitution - Batch update sections and subsections - Don't demote h7s, since there is no lower heading level
'Twas the night before Christmas, and all through the app, |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As we have continued to add steps to the NOFO import step, we've ended up increasing the total time needed to import a NOFO.
After running some profiling of the entire import step, we discovered the largest contributor to the import slowdown is the
add_headings_to_nofo
function, which was acocunting for over 60% of the import time for a typical NOFO.Using a CMS NOFO as an example, we were seeing this online:
add_headings_to_nofo
runtime: 14.137533 secondsSo if we can make this function more efficient, we can dramatically improve our NOFO import times.