Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COSStream has been closed and cannot be read. #57

Open
mladvladimir opened this issue Jun 7, 2020 · 2 comments
Open

COSStream has been closed and cannot be read. #57

mladvladimir opened this issue Jun 7, 2020 · 2 comments

Comments

@mladvladimir
Copy link

Hi,

This code snippet:

(-> (split/split-pdf :input "test/pdfs/multi-page.pdf")
    (nth <some-idx>)
    text/extract)

often throws:
Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:83). COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?

Thanks for lib :)

@dotemacs
Copy link
Owner

Hi @mladvladimir

sorry about the delay with this.

The issue is a bit annoying due to the following problem:

The way split-pdf was implemented, when it opened a document, it wouldn't close it at the end when it was done with it. So with this commit, the document was closed:

2ed5824#diff-4c66c23479d6f9f33283c8cf88c003d0

Which works as expected if you're only using split-pdf.

But if you intend on passing that opened & split PDF document to a next function, like in your example with the threading macro, then the issue comes up.

So I knew about this, but I'm not sure what would be the right approach to solve this.

This is my current thinking: when I originally implemented those functions, I just did them all in isolation. Then somebody created a PR which allowed you to pass the opened PDF stream from one function to the next.

So if I implement a solution which will allow for an open stream to be passed onto the next function, I'd probably have to introduce a breaking change. Something that tells a function, not to close the open PDF stream. Like this:

(-> (split/split-pdf :input "test/pdfs/multi-page.pdf" :stream true)
    (nth <some-idx>)
    text/extract)

Where the new change is the :stream true.

Since you're using this, what are your thoughts on this?

Do you have some other suggestion or approach that I should consider?

Thanks

@chenj7
Copy link

chenj7 commented Aug 6, 2022

Still not very good at clojure despite loving it, as well as your wrapper library, but was really needing to just save out the split PDFs and this issue was blocking me.
Not sure if this is in line with clojure idioms, but I wound up exposing a :callback option where you can pass a function receiving the results so that you can continue to do stuff with it, e.g. saving the docs.

(defn split-pdf
  "Split pdf into pages"
  [& {:keys [input start end split callback]}]
  (with-open [doc (common/obtain-document input)]
    (let [splitter (Splitter.)]
      (when start (.setStartPage splitter start))
      (when end (.setEndPage splitter end))
      (when split (.setSplitAtPage splitter split))
      (let [result (into [] (.split splitter doc))]
        (if callback
          (callback result)
          result)))))

Then when I went to use:

(split-pdf :input "big_doc.pdf"
           :start 1
           :end 1
           :callback (fn [[result]]
                        (.save result (path-for folder "pg1.pdf"))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants