Releases: alephdata/ingest-file
3.19.2
What's Changed
- Fix handling of multipart emails by @tillprochaska in #488
- Send ProcessingExceptions to Sentry by @stchris in #487
New Contributors
- @tillprochaska made their first contribution in #488
Full Changelog: 3.18.4...3.19.2
3.19.2-rc1
What's Changed
- Fix handling of multipart emails by @tillprochaska in #488
- Send ProcessingExceptions to Sentry by @stchris in #487
Full Changelog: 3.18.4...3.19.2-rc1
3.19.1
3.19.0
What's Changed
- Add support for linting with ruff by @stchris in #468
- Bump versions of FTM and servicelayer by @catileptic
- Add ingest-file version to Document by @catileptic
- Lint with black by @stchris
- Bump versions: followthemoney==3.4.3, followthemoney-store[postgresql]==3.0.5, servicelayer[google,amazon]==1.21.0 by @stchris and @catileptic
Full Changelog: 3.18.4...3.19.0
3.18.4
What's Changed
Major PDF library change
We are hereby deprecating pdflib, replacing it with a well maintained, performant library: pymupdf. This enables local development on hardware with Apple Silicon CPUs. This also enables support for JBIG2 images in PDF files.
License change
Because of the above dependency as of this release ingest-file
is licensed under the terms of the AGPLv3+ license.
Integrating convert-document into ingest-file
- Merge convert-document into ingest-file by @stchris in #395
- Better logging when converting documents to pdf by @Rosencrantz in #376
Smaller changes
-
Fix PDF ingest bug by @catileptic in #430
-
Do full page OCR for PDF pages with Type3 fonts by @stchris in #449
Dependency upgrades
- Bump pikepdf from 6.2.8.post1 to 7.1.1 by @dependabot in #434
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
- Bump pillow from 9.4.0 to 9.5.0 by @dependabot in #448
- Bump google-cloud-vision from 3.4.0 to 3.4.1 by @dependabot in #447
- Bump openpyxl from 3.1.1 to 3.1.2 by @dependabot in #446
- Bump pytest from 7.2.1 to 7.2.2 by @dependabot in #444
- Bump tesserocr from 2.5.2 to 2.6.0 by @dependabot in #445
Full Changelog: 3.18.2...3.18.4
3.18.4-rc4
- Hotfix for the image path where full page images get extracted to (when ingesting PDFs with Type3 fonts)
Full Changelog: 3.18.4-rc3...3.18.4-rc4
3.18.4-rc3
What's Changed
Dependency upgrades
- Bump pillow from 9.4.0 to 9.5.0 by @dependabot in #448
- Bump google-cloud-vision from 3.4.0 to 3.4.1 by @dependabot in #447
- Bump openpyxl from 3.1.1 to 3.1.2 by @dependabot in #446
- Bump pytest from 7.2.1 to 7.2.2 by @dependabot in #444
- Bump tesserocr from 2.5.2 to 2.6.0 by @dependabot in #445
Full Changelog: 3.18.4-rc1...3.18.4-rc3
3.18.4-rc1
What's Changed
- Use PyMuPDF instead of pikepdf + pdfminer.six for PDF ingestion (text and image extraction). #441
Dependency upgrades
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
Full Changelog: 3.18.2...3.18.4-rc1
3.18.3-rc2
What's Changed
- Fix PDF ingest bug by @catileptic in #430
Dependency upgrades
- Bump pikepdf from 6.2.8.post1 to 7.1.1 by @dependabot in #434
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
Full Changelog: 3.18.2...3.18.3-rc2
3.18.2
IMPORTANT NOTE: this release was pulled. At this time 3.17.1
is the latest release.
What's Changed
- Update public error message for password protected PDFs by @catileptic in #422
Dependency upgrades
- Bump requests[security] from 2.28.1 to 2.28.2 by @dependabot in #413
- Bump google-cloud-vision from 3.2.0 to 3.3.0 by @dependabot in #412
- Bump pikepdf from 6.2.7 to 6.2.8.post1 by @dependabot in #411
New Contributors
- @catileptic made their first contribution in #422
Full Changelog: 3.18.0...3.18.2