Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up NOFO importing #129

Closed
pcraig3 opened this issue Dec 23, 2024 · 5 comments
Closed

Speed up NOFO importing #129

pcraig3 opened this issue Dec 23, 2024 · 5 comments
Assignees

Comments

@pcraig3
Copy link
Collaborator

pcraig3 commented Dec 23, 2024

Hello!

We got hit with a humdinger of a NOFO document which winds up being 200+ pages once printed out.

It's so big, actually, that we can't even import it in production without hitting our 90 second timeout limit, which means it errors out.

Importing a NOFO takes about 10x longer in production than it does when I run the NOFO Builder locally. Some of that is latency but some of that is also that it's running on a less powerful machine.

Next step for this is to run a benchmarking assessment on the import function and see if we can identify a bottleneck.

@pcraig3 pcraig3 changed the title [2025] NOFO import Benchmarking [WIP] NOFO import Benchmarking Dec 23, 2024
@pcraig3 pcraig3 self-assigned this Dec 23, 2024
@pcraig3
Copy link
Collaborator Author

pcraig3 commented Dec 24, 2024

Okay, ran the import function (before and after) #134 and here is what I found.

Tested on two different NOFOs on my desktop and then online.

testing locally

Tested using a regular NOFO: CMS-2V2-25-001

(before) Regular NOFO (after) Regular NOFO
2.063558 seconds 1.337940 seconds
36% reduction

Tested using the longest NOFO we have ever seen: CDC-RFA-CD-25-0019

(before) Long NOFO (after) Long NOFO
20.909973 seconds 4.296070 seconds
80% reduction

testing on nofo.rodeo

Tested using a regular NOFO: CMS-2V2-25-001

(before) Regular NOFO (after) Regular NOFO
21.829363 seconds 8.694077 seconds
61% reduction

Tested using the longest NOFO we have ever seen: CDC-RFA-CD-25-0019

(before) Long NOFO (after) Long NOFO
129.484468 seconds 26.625626 seconds
80% reduction

Verdict

Huge speedup. No question we should ship this.

@pcraig3 pcraig3 changed the title [WIP] NOFO import Benchmarking NOFO import Benchmarking Dec 24, 2024
@pcraig3
Copy link
Collaborator Author

pcraig3 commented Dec 24, 2024

Once we merge #134, the next least good method we have is the _build_nofo function, so I think we can also add batch processing to that one and save some more time on import.

@pcraig3
Copy link
Collaborator Author

pcraig3 commented Dec 26, 2024

Okay, testing my import function again now that I am using batch creates for _build_nofo (#135) and here is what I found.

Tested on two different NOFOs on my desktop and then online.

testing locally

Tested using a regular NOFO: CMS-2V2-25-001

(before) Regular NOFO (after) Regular NOFO
1.337940 seconds 1.014263 seconds
25% reduction

Tested using the longest NOFO we have ever seen: CDC-RFA-CD-25-0019

(before) Long NOFO (after) Long NOFO
4.296070 seconds 3.691065 seconds
15% reduction

testing on nofo.rodeo

Tested using a regular NOFO: CMS-2V2-25-001

(before) Regular NOFO (after) Regular NOFO
8.694077 seconds 3.691065 seconds
58% reduction

Tested using the longest NOFO we have ever seen: CDC-RFA-CD-25-0019

(before) Long NOFO (after) Long NOFO
26.625626 seconds 8.037420 seconds
70% reduction

Verdict

There is a modest speed up locally and then a much larger drop online. This makes sense, since there is more latency when the app is deployed in accessing the database, so the multiple save calls would really add up.

@pcraig3 pcraig3 changed the title NOFO import Benchmarking Speed up NOFO importing Dec 26, 2024
@pcraig3
Copy link
Collaborator Author

pcraig3 commented Dec 26, 2024

After #135 goes in, I think we are good to close this issue.

The next longest-running function is mammoth.convert_to_html, which is a vendor function, and therefore not one that we have any real control over how long it runs. We could probably squeeze some efficiency out of changing some of our Soup selectors but they aren't really going to give us the 50-60% reductions in import times that we have been seeing so far.

@pcraig3
Copy link
Collaborator Author

pcraig3 commented Jan 2, 2025

Closing since #135 went in!

@pcraig3 pcraig3 closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant