You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I made two scripts to split pdfs into pages and join pdfs. The join script fails to join pages that th split script produces.
Tested on the pdf attached. 2019_ReferenceWorkEntry_.pdf
What am I doing wrong?
Cell In[23], line 4
pdf = PDF.loads(pdf_file_handle)
File ~/miniforge3/envs/bjorn311/lib/python3.11/site-packages/borb/pdf/pdf.py:83 in loads
document: Document = ReadAnyObjectTransformer().transform(
...
File ~/miniforge3/envs/bjorn311/lib/python3.11/site-packages/borb/io/read/tokenize/high_level_tokenizer.py:144 in read_indirect_object
value = self.read_object()
File ~/miniforge3/envs/bjorn311/lib/python3.11/site-packages/borb/io/read/tokenize/high_level_tokenizer.py:206 in read_object
return self.read_dictionary()
File ~/miniforge3/envs/bjorn311/lib/python3.11/site-packages/borb/io/read/tokenize/high_level_tokenizer.py:94 in read_dictionary
assert token.get_token_type() == TokenType.NAME
AssertionError
split:
#!/home/bjorn/miniforge3/envs/bjorn311/bin python3
# -*- coding: utf-8 -*-
import sys
from pathlib import Path
from borb.pdf.document.document import Document
from borb.pdf.pdf import PDF
from tqdm import tqdm
script, *cliarg = sys.argv
pdfpaths = [Path(p) for p in cliarg] or
[2019_ReferenceWorkEntry_.pdf](https://github.com/jorisschellekens/borb/files/14094251/2019_ReferenceWorkEntry_.pdf)
sorted(Path(".").glob("*.pdf"))
for pdfpath in tqdm(pdfpaths):
fn = pdfpath.stem
with open(pdfpath, "rb") as pdf_file_handle:
pdf = PDF.loads(pdf_file_handle)
number_of_pages = int(pdf.get_document_info().get_number_of_pages())
for i in range(number_of_pages):
print(i)
outpdf = Document()
outpdf.add_page(pdf.get_page(i))
with open(f"{fn}_{i:03d}.pdf", "wb") as pdf_out_handle:
PDF.dumps(pdf_out_handle, outpdf)
join:
#!/home/bjorn/miniforge3/envs/bjorn311/bin python3
# -*- coding: utf-8 -*-
# https://pdfstandalone.com/en/merge-pdf
import sys
from pathlib import Path
from borb.pdf.document.document import Document
from borb.pdf.pdf import PDF
from tqdm import tqdm
script, *cliarg = sys.argv
pdfpaths = [Path(p) for p in cliarg] or sorted(Path(".").glob("*.pdf"))
output_document = Document()
outpath = Path("output.pdf")
try:
pdfpaths.remove(outpath)
except ValueError:
pass
for pdfpath in tqdm(pdfpaths):
with open(pdfpath, "rb") as pdf_file_handle:
pdf = PDF.loads(pdf_file_handle)
output_document.add_document(pdf)
with open(outpath, "wb") as pdf_out_handle:
PDF.dumps(pdf_out_handle, output_document)
The text was updated successfully, but these errors were encountered:
I made two scripts to split pdfs into pages and join pdfs. The join script fails to join pages that th split script produces.
Tested on the pdf attached. 2019_ReferenceWorkEntry_.pdf
What am I doing wrong?
Cell In[23], line 4
pdf = PDF.loads(pdf_file_handle)
File ~/miniforge3/envs/bjorn311/lib/python3.11/site-packages/borb/pdf/pdf.py:83 in loads
document: Document = ReadAnyObjectTransformer().transform(
...
split:
join:
The text was updated successfully, but these errors were encountered: