Skip to content

Releases: alephdata/ingest-file

3.18.1

18 Jan 08:23
762845f
Compare
Choose a tag to compare

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Handle TIFFs in PDFs by converting to PNG by @stchris in #419

  • PDF ingest: ignore unsupported image file formats

  • PDF ingest: normalize text using unicode.normalize

  • Change dependabot schedules to monthly by @stchris in #414

Full Changelog: 3.18.0...3.18.1

3.18.1-rc3

17 Jan 11:13
f60f7de
Compare
Choose a tag to compare
3.18.1-rc3 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • PDF ingest: ignore unsupported image file formats
  • PDF ingest: normalize text using unicode.normalize

Full Changelog: 3.18.0...3.18.1-rc3

3.18.1-rc2

17 Jan 08:14
f0b705d
Compare
Choose a tag to compare
3.18.1-rc2 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Handle TIFFs in PDFs by converting to PNG by @stchris in #419
  • Change dependabot schedules to monthly by @stchris in #414

Full Changelog: 3.18.0...3.18.1-rc2

3.18.0

09 Jan 14:20
08f5533
Compare
Choose a tag to compare

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

Major PDF library change

We are hereby deprecating pdflib, replacing it with well maintained, performant libraries. This enables local development on hardware with Apple Silicon CPUs. This also enables support for JBIG2 images in PDF files.

  • Replace pdflib with pdfminersix (for text) & pikpedf (for images) by @stchris in #380
  • Properly link page entities to the Pages entity they belong to by @stchris in #410
  • Remove poppler by @stchris in #393
  • Better word recognition with large spaces between letters by @stchris in #402
  • Preference towards small text as opposed to spaced apart one by @stchris in #403

Integrating convert-document into ingest-file

Smaller changes

Dependency updates

Full Changelog: 3.17.1...3.18.0

3.18.0-rc4

06 Jan 13:38
f9a3a64
Compare
Choose a tag to compare
3.18.0-rc4 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Properly link page entities to the Pages entity they belong to (which fixes #398) by @stchris in #410

Dependency updates

Full Changelog: 3.18.0-rc3...3.18.0-rc4

3.18.0-rc3

02 Jan 12:27
01940ed
Compare
Choose a tag to compare
3.18.0-rc3 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Better word recognition with large spaces between letters (#401) by @stchris in #402

Version bumps

Full Changelog: 3.18.0-rc2...3.18.0-rc3

3.18.0-rc2

08 Dec 16:57
cccbd60
Compare
Choose a tag to compare
3.18.0-rc2 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

(includes all changes from https://github.com/alephdata/ingest-file/releases/tag/3.18.0-rc1)

What's Changed

Full Changelog: 3.18.0-rc1...3.18.0-rc2

3.18.0-rc1

30 Nov 09:52
2347c59
Compare
Choose a tag to compare
3.18.0-rc1 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Replace pdflib with pdfminersix (text) & pikpedf (images) by @stchris in #380.
    We are hereby deprecating pdflib, replacing it with well maintained, performant libraries. This enables local development on hardware with Apple Silicon CPUs.
  • Document JSON logging format option in the docker-compose file by @stchris in #392
  • Improved logging output when converting documents to pdf to highlight cases where we have a high number of retry attempts by @Rosencrantz in #376
  • Replace nosetests with pytest by @stchris in #381
  • Updated bump2version config to allow rc releases, aligning with aleph by @stchris in #388

Version bumps

Full Changelog: 3.17.1...3.18.0-rc1

3.17.1

04 Nov 08:50
Compare
Choose a tag to compare

What's changed

  • bump followthemoney to 3.1.0

Full Changelog: 3.17.0...3.17.1

3.17.0

03 Nov 10:40
Compare
Choose a tag to compare

What's Changed

  • Updated release procedure and dev env docs by @stchris in #369
  • Bumped followthemoney to version 3.0.10

New Contributors

Full Changelog: 3.16.4...3.17.0