Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API not working for files over 100KB #36

Open
jmbanda opened this issue Dec 20, 2023 · 4 comments
Open

API not working for files over 100KB #36

jmbanda opened this issue Dec 20, 2023 · 4 comments

Comments

@jmbanda
Copy link

jmbanda commented Dec 20, 2023

Greetings,

I am submitting a large set of files and only smaller files under 100KB are getting processed all others do not error out or provide any error message. I have adjusted the timeout parameter and this does not fix the issue.

Thanks!

@StevenMaude
Copy link
Contributor

@jmbanda: sorry for the delayed reply, you caught us over the holiday period.

Is this still an issue? If so, is it possible to provide us with the example code and PDFs to try and reproduce the error?

@jmbanda
Copy link
Author

jmbanda commented Jan 8, 2024

Greetings, yes, this continues to be an issue. I can't provide the PDF as it is private, but any PDF above 100KB was failing with the following code:

import pdftables_api

c = pdftables_api.Client('my-api-key', timeout=(60, 3600))
c.xlsx('input.pdf', 'output.xlsx')

Same happens with or without the timeout parameter. We still have plenty of pages left in our paid bundle, so that is not the issue. There is no error being thrown, it just skips the documents. If we input the document on the web UI manually, it works well.

@StevenMaude
Copy link
Contributor

Thanks; we'll add it to our issue queue and take a look, then report back (it may be a few days).

@StevenMaude
Copy link
Contributor

Just to follow up, I've tested the code here on a fresh Ubuntu 22.04 virtual machine and can't reproduce the issue. This was using Python 3.10 that came bundled with the operating system.

I did the following:

  1. Created a virtualenv with python3 -m venv api

  2. Activated the virtualenv with source api/bin/activate to activate the virtualenv

  3. Ran pip install git+https://github.com/pdftables/python-pdftables-api.git to install the API code.

  4. Converted a test PDF named input.pdf of size 360 KB with the following code (edited to include my actual API key):

    import pdftables_api
    
    c = pdftables_api.Client('my-api-key', timeout=(60, 3600))
    c.xlsx('input.pdf', 'output.xlsx')

This produced an output Excel file named output.xlsx.

If you can give any more details about the environment in which the code was failing, we can try and reproduce further. It's tricky to fix without encountering the problem, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants