Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error download large LAR files #801

Open
meissadia opened this issue Dec 21, 2020 · 4 comments · May be fixed by #1739
Open

Error download large LAR files #801

meissadia opened this issue Dec 21, 2020 · 4 comments · May be fixed by #1739
Assignees
Labels
Backlog blocked Waiting on other work to be completed. bug Something isn't working

Comments

@meissadia
Copy link
Contributor

Similar issue:
cfpb/hmda-data-browser#62
cfpb/hmda-data-browser#83

@meissadia meissadia added Backlog bug Something isn't working and removed Backlog labels Dec 21, 2020
@meissadia meissadia self-assigned this Dec 21, 2020
@meissadia
Copy link
Contributor Author

meissadia commented Dec 22, 2020

Downloading

  • Chrome fails after 20 minutes, 150MB (unclear error)
  • Safari fails after 40 minutes, 1.43GB (memory error)

Alternate Fetch methods

  • Anchor method used for Data Browser fails due to our need for authentication
  • Streams method fails (premature done signal)

Need to save the stream?

@meissadia
Copy link
Contributor Author

Tested again on 12/29

  • Anchor method failed with 'File not found'.
  • Basic fetch terminated prematurely without error.
  • Stream method to prematurely get a done signal.

@meissadia
Copy link
Contributor Author

meissadia commented Dec 30, 2020

12/30

  • Doubled nginx config limits
    client_body_buffer_size  32k;
    client_header_buffer_size 2k;
    client_max_body_size 10m;
    large_client_header_buffers 4 16k;
    client_body_timeout 120s;
    client_header_timeout 120s;
    send_timeout 120s;
    
  • Used blob format for API response

I'm still seeing the same early termination for large files.

I also see the same behavior via Postman. When doing a "Send and Download", I have not been able to download more than 230MB.

@meissadia
Copy link
Contributor Author

1/22

Testing via CURL

curl -H "Authorization: Bearer <token>" \
https://<dev>/v2/filing/institutions/B90YWS6AFX2LGWOXJ1LD/filings/2020/submissions/723/edits/csv?format=csv \
--output ~/Downloads/edits_via_curl.test

Result:

curl: (18) transfer closed with outstanding read data remaining

Some suggestions that did not resolve the error:

  • Add option --keepalive-time 2
  • Add header Accept-Encoding: gzip, deflate
curl -H "Authorization: Bearer <token>" \
-H 'Accept-encoding: gzip, deflate' \
--keepalive-time 2 \
--output ~/Downloads/edits_via_curl.test \
 https://<dev>/v2/filing/institutions/B90YWS6AFX2LGWOXJ1LD/filings/2020/submissions/723/edits/csv?format=csv 

Other avenues to explore:

  • Server sending wrong Content-Length header?

@meissadia meissadia added the blocked Waiting on other work to be completed. label Jan 28, 2021
@meissadia meissadia removed their assignment Apr 26, 2022
@meissadia meissadia self-assigned this Apr 26, 2022
@meissadia meissadia linked a pull request Mar 10, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog blocked Waiting on other work to be completed. bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants