Skip to content

Releases: jorisschellekens/borb

v2.0.8

14 Aug 17:50
Compare
Choose a tag to compare

📣 borb release 2.0.8

This release features:

  • Small bugfix in Page.pop_page function; /Count was not being updated properly

  • Convenient method to rotate Page clockwise and counterclockwise 90 degrees

  • Refactor of all EventListener implementations

    • All methods that get something after an EventListener processes a Document now follow the naming convention get_xxx_for_page
    • e.g: get_text_for_page, get_images_for_page, get_colors_for_page etc
  • Refactor of the Color API

    • All Color instances are now constructed with values 0..1

      • Except HexColor, Pantone and X11Color (which are constructed with str objects)
    • Extra utility methods in HSVColor:

      • complementary : produces the complementary Color of the given Color
      • analogous : produces 2 Color objects that are similar to the given Color
      • split_complementary: produces 2 Color objects that are similar to the complementary of the given Color, thus forming a split complementary color group
      • triadic : produces 2 Color objects that form a triad with a given Color
      • tetradic_rectangle: produces 3 Color objects; a Color that is analogous to the given Color and the complementary pair of this pair
      • tetradic_square: produces 3 Color objects that form a tetradic square with the given Color
    • More tests (for aforementioned functionality)

  • Improved testing

    • Tests can now visually compare their output to a ground truth PNG

      • Output PDF of the test is converted to PNG (using GhostScript)
      • If a ground_truth.png file is present, its pixels are compared to the test output.png
      • Almost full automation of the entire test-suite
  • (the start of) Forms

    • FormField class represents a common base class for anything you might find on a form (checkbox, textfield, dropdown, etc)

      • Implementation of TextField
      • Implementation of DropDownList
      • Convenience implementation of CountryDropDownList
      • Implementation of CheckBox
    • Further releases will improve the way the layout algorithm handles these LayoutElement implementations

      • margin
      • padding
      • font_color
      • font
      • background_color
      • font_size
    • Once FormField objects can be added:

      • Retrieve fields (and in particular their values) from Page
      • Set value for each field (using Page)
      • "Flatten" (remove field, keep value) FormField

v2.0.7

01 Aug 19:27
Compare
Choose a tag to compare

📣 borb release 2.0.7

This release features:

  • Table detection
  • fix quite a few mypy warnings
  • rebranding of all examples
  • rename BaseTable to Table
  • Chart objects now have horizontal_alignment and vertical_alignment
  • Update README.md

Table Detection

This feature enables you to scan a Page (using the TableDetectionByLines implementation of EventListener) for content that is likely to be a Table.
You can then retrieve:

  • the coordinates of the bounding box of the Table
  • the coordinates of each cell (including those cells that may have row_span and/or column_span)

Rename BaseTable to Table

I saw this inconsistency when I was writing a tutorial.
List has two implementations OrderedList and UnorderedList, it makes sense to rename BaseTable to Table.
Perhaps in future, more of these renames will occur as I try to achieve consistency over the entire library.

Rebranding of all examples

Now that borb has a logo and theme-colors it makes sense to ensure every example uses these colors.

v2.0.6

24 Jul 19:36
d8b9384
Compare
Choose a tag to compare
v2.0.6 Pre-release
Pre-release

📣 borb release 2.0.6

This release features:

  • Rename: pText has become borb
  • Support for more kinds of TrueType fonts (previously only TrueType fonts with max. 256 glyphs)
  • More tests (for aforementioned TrueType fonts)

2.0.0

28 Jun 19:20
Compare
Choose a tag to compare

📣 pText release 2.0.0

This release features:

  • Small bugfixes in the setup.py script (ensuring some dependencies that are present by default on Linux get installed on Windows)
  • Refactor of the LayoutElement implementations
  • Allowing users access to previously internal parameters of PageLayout implementations (such as margins)
  • Improvements to ChunksOfText (now HeterogeneousParagraph, representing a heterogeneous paragraph)
  • New text-layout class Span (similar to HeterogeneousParagraph, without default top/bottom margin)
  • LayoutElement implementations have margins now (which was needed for HTML), you may expect some layout differences between this version of pText and former versions.
  • A new PageLayout mechanism: BrowserLayout
  • A new implementation of BaseTable.
    • FlexibleColumnWidthTable (which behaves more like tables in HTML)
    • FixedColumnWidthTable (which assigns a fixed width to every column)
  • HTMLToPDF supports a lot more tags:
    • body
    • head
    • meta
    • title
    • h1 to h6
    • ħr
    • img
    • ul, ol, li
    • address
    • main
    • section
    • table, tbody, td, th, tr
    • b, strong
    • i, em
    • a
    • abbr
    • br
    • code
    • mark
    • p

Check the examples and tests for more information.
A dozen or so documents have been provided as examples for HTMLToPDF.

1.9.0

14 Jun 06:40
Compare
Choose a tag to compare

📣 pText release 1.9.0

This release features quite a few new functionalities:

  • OCR
  • Pantone colors
  • Markdown to PDF conversion

It also features some minor improvements to general layout logic:

  • Tables are now automatically completed (with empty Paragraph objects)
  • support for heterogeneous paragraphs (see ChunksOfText object)
  • layout package refactor to separate classes

OCR

Using Tesseract (or rather pytesseract), pText is now able to handle scanned images in a PDF.
Typically, a scanned document will present itself as a PDF, without containing any content other than the image of the page.
pText can now restore text to such PDF documents.

The OCR capabilities have been integrated nicely with the existing EventListener framework. New events have been added to represent scanned text being recognized.
Two extra implementations of EventListener deal with OCR:

  • OCRImageRenderEventListener : is triggered whenever an image is detected in the PDF, scans the image, and produces OCREvent objects
  • OCRAsOptionalContentGroup : extends OCRImageRenderEventListener and adds optional (invisible) content to the PDF, representing the recognized text

pytesseract is not added as a dependency in the setup script.
If you do choose to use OCR, you should install pytesseract and download the Tesseract data directories.

Pantone colors

Pantone colors are now supported, similar to X11Color, Pantone has a dictionary of names, mapped to hexadecimal strings.
When constructing a Pantone object, simply pass a valid color-name, and you'll receive its corresponding HexColor object.

Markdown to PDF

pText can now convert (simple) Markdown to PDF.
It does not (yet) support HTML, since that would require an entire HTML engine.

pText supports:

  • Headers
  • Tables
  • Ordered lists (not nested)
  • Unordered lists (not nested)
  • Code snippet (by indent, and fenced)
  • Blockquote
  • Images
  • Paragraphs
  • Horizontal rules

Check the examples and tests to get a better idea of what is supported, and find a demo-document and its matching output.

1.8.9

05 Jun 23:03
Compare
Choose a tag to compare

📣 pText release 1.8.9

This release features a few non-essential updates to the pText codebase that are mostly related to testing.
This includes:

  • All tests have been refactored to follow the same format, with a small table atop the resulting PDF describing the test, when the test was run, etc
  • All tests (attempt to) follow the same color-scheme (making them look more professional and consistent)
  • Tests against the entire corpus have been limited to the essentials, with extensive reporting

⬆️ Performance Boost

There are a few minor tweaks that have boosted the performance of pText as a whole.
This includes the copy-behaviour of Font objects in the CanvasGraphicsState. This has caused a speed-up of nearly 33%.

📄 Fonts

I have also implemented some minor fixes to the whole Font logic, ensuring font-sizes are now handled properly,
regardless of whether they are passed as an argument to the Tf operator or via the text-matrix in the CanvasGraphicsState.

I have also started implementing OCR. But more on that in a future release.

🔒 Redaction

Finally, this release includes everything needed to perform redaction.
This is the process of:

  • marking content to be removed (but not removing it, enabling review by a third party)
  • removing content that has been marked

This functionality integrates nicely in the existing pText framework of Page annotations.
Check the examples for more details (look for "adding redaction annotations to a PDF")

1.8.8

15 May 19:47
Compare
Choose a tag to compare

pText 1.8.8

Major overhaul of all font-related functionality.
This release features

  • a speedup in copying fonts (often performed when processing pages),
  • as well as the ability to use custom (ttf) fonts

Furthermore, all public methods have been documented.

1.8.7

25 Apr 08:04
Compare
Choose a tag to compare

pText 1.8.7

This release features:

  • more documentation
  • ´setup.py´ and ´requirements.txt´ have changed to ensure pText can easily be installed
  • support for embedded files in PDF

1.8.6

13 Apr 21:34
Compare
Choose a tag to compare

pText version 1.8.6

This is a documentation release.
The documentation percentage is now 90%

1.8.3

27 Mar 22:29
Compare
Choose a tag to compare
1.8.3 Pre-release
Pre-release

1.8.3

  • Bugfix release
  • Improvements to layout algorithm
  • Improvements to IO (enabling a Document to be saved multiple times)