3.18.4
What's Changed
Major PDF library change
We are hereby deprecating pdflib, replacing it with a well maintained, performant library: pymupdf. This enables local development on hardware with Apple Silicon CPUs. This also enables support for JBIG2 images in PDF files.
License change
Because of the above dependency as of this release ingest-file
is licensed under the terms of the AGPLv3+ license.
Integrating convert-document into ingest-file
- Merge convert-document into ingest-file by @stchris in #395
- Better logging when converting documents to pdf by @Rosencrantz in #376
Smaller changes
-
Fix PDF ingest bug by @catileptic in #430
-
Do full page OCR for PDF pages with Type3 fonts by @stchris in #449
Dependency upgrades
- Bump pikepdf from 6.2.8.post1 to 7.1.1 by @dependabot in #434
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
- Bump pillow from 9.4.0 to 9.5.0 by @dependabot in #448
- Bump google-cloud-vision from 3.4.0 to 3.4.1 by @dependabot in #447
- Bump openpyxl from 3.1.1 to 3.1.2 by @dependabot in #446
- Bump pytest from 7.2.1 to 7.2.2 by @dependabot in #444
- Bump tesserocr from 2.5.2 to 2.6.0 by @dependabot in #445
Full Changelog: 3.18.2...3.18.4