Releases: jorisschellekens/borb
v2.0.8
📣 borb release 2.0.8
This release features:
-
Small bugfix in
Page.pop_page
function;/Count
was not being updated properly -
Convenient method to rotate
Page
clockwise and counterclockwise 90 degrees -
Refactor of all
EventListener
implementations- All methods that get something after an
EventListener
processes aDocument
now follow the naming conventionget_xxx_for_page
- e.g:
get_text_for_page
,get_images_for_page
,get_colors_for_page
etc
- All methods that get something after an
-
Refactor of the
Color
API-
All
Color
instances are now constructed with values0..1
- Except
HexColor
,Pantone
andX11Color
(which are constructed withstr
objects)
- Except
-
Extra utility methods in
HSVColor
:complementary
: produces the complementaryColor
of the givenColor
analogous
: produces 2Color
objects that are similar to the givenColor
split_complementary
: produces 2Color
objects that are similar to the complementary of the givenColor
, thus forming a split complementary color grouptriadic
: produces 2Color
objects that form a triad with a givenColor
tetradic_rectangle
: produces 3Color
objects; aColor
that is analogous to the givenColor
and the complementary pair of this pairtetradic_square
: produces 3Color
objects that form a tetradic square with the givenColor
-
More tests (for aforementioned functionality)
-
-
Improved testing
-
Tests can now visually compare their output to a ground truth PNG
- Output PDF of the test is converted to PNG (using GhostScript)
- If a ground_truth.png file is present, its pixels are compared to the test output.png
- Almost full automation of the entire test-suite
-
-
(the start of) Forms
-
FormField class represents a common base class for anything you might find on a form (checkbox, textfield, dropdown, etc)
- Implementation of TextField
- Implementation of DropDownList
- Convenience implementation of CountryDropDownList
- Implementation of CheckBox
-
Further releases will improve the way the layout algorithm handles these LayoutElement implementations
- margin
- padding
- font_color
- font
- background_color
- font_size
-
Once FormField objects can be added:
- Retrieve fields (and in particular their values) from Page
- Set value for each field (using Page)
- "Flatten" (remove field, keep value) FormField
-
v2.0.7
📣 borb release 2.0.7
This release features:
Table
detection- fix quite a few
mypy
warnings - rebranding of all examples
- rename
BaseTable
toTable
Chart
objects now havehorizontal_alignment
andvertical_alignment
- Update
README.md
Table Detection
This feature enables you to scan a Page
(using the TableDetectionByLines
implementation of EventListener
) for content that is likely to be a Table
.
You can then retrieve:
- the coordinates of the bounding box of the
Table
- the coordinates of each cell (including those cells that may have
row_span
and/orcolumn_span
)
Rename BaseTable
to Table
I saw this inconsistency when I was writing a tutorial.
List
has two implementations OrderedList
and UnorderedList
, it makes sense to rename BaseTable
to Table
.
Perhaps in future, more of these renames will occur as I try to achieve consistency over the entire library.
Rebranding of all examples
Now that borb
has a logo and theme-colors it makes sense to ensure every example uses these colors.
v2.0.6
📣 borb release 2.0.6
This release features:
- Rename:
pText
has becomeborb
- Support for more kinds of TrueType fonts (previously only TrueType fonts with max. 256 glyphs)
- More tests (for aforementioned TrueType fonts)
2.0.0
📣 pText release 2.0.0
This release features:
- Small bugfixes in the setup.py script (ensuring some dependencies that are present by default on Linux get installed on Windows)
- Refactor of the
LayoutElement
implementations - Allowing users access to previously internal parameters of
PageLayout
implementations (such as margins) - Improvements to
ChunksOfText
(nowHeterogeneousParagraph
, representing a heterogeneous paragraph) - New text-layout class
Span
(similar toHeterogeneousParagraph
, without default top/bottom margin) LayoutElement
implementations have margins now (which was needed for HTML), you may expect some layout differences between this version ofpText
and former versions.- A new PageLayout mechanism:
BrowserLayout
- A new implementation of
BaseTable
.FlexibleColumnWidthTable
(which behaves more like tables in HTML)FixedColumnWidthTable
(which assigns a fixed width to every column)
HTMLToPDF
supports a lot more tags:body
head
meta
title
h1
toh6
ħr
img
ul
,ol
,li
address
main
section
table
,tbody
,td
,th
,tr
b
,strong
i
,em
a
abbr
br
code
mark
p
Check the examples and tests for more information.
A dozen or so documents have been provided as examples for HTMLToPDF
.
1.9.0
📣 pText release 1.9.0
This release features quite a few new functionalities:
- OCR
- Pantone colors
- Markdown to PDF conversion
It also features some minor improvements to general layout logic:
Tables
are now automatically completed (with emptyParagraph
objects)- support for heterogeneous paragraphs (see
ChunksOfText
object) - layout package refactor to separate classes
OCR
Using Tesseract
(or rather pytesseract
), pText
is now able to handle scanned images in a PDF.
Typically, a scanned document will present itself as a PDF, without containing any content other than the image of the page.
pText
can now restore text to such PDF documents.
The OCR capabilities have been integrated nicely with the existing EventListener
framework. New events have been added to represent scanned text being recognized.
Two extra implementations of EventListener
deal with OCR:
OCRImageRenderEventListener
: is triggered whenever an image is detected in the PDF, scans the image, and producesOCREvent
objectsOCRAsOptionalContentGroup
: extendsOCRImageRenderEventListener
and adds optional (invisible) content to the PDF, representing the recognized text
pytesseract
is not added as a dependency in the setup script.
If you do choose to use OCR, you should install pytesseract
and download the Tesseract
data directories.
Pantone colors
Pantone colors are now supported, similar to X11Color
, Pantone
has a dictionary of names, mapped to hexadecimal strings.
When constructing a Pantone
object, simply pass a valid color-name, and you'll receive its corresponding HexColor
object.
Markdown to PDF
pText
can now convert (simple) Markdown to PDF.
It does not (yet) support HTML, since that would require an entire HTML engine.
pText
supports:
- Headers
- Tables
- Ordered lists (not nested)
- Unordered lists (not nested)
- Code snippet (by indent, and fenced)
- Blockquote
- Images
- Paragraphs
- Horizontal rules
Check the examples and tests to get a better idea of what is supported, and find a demo-document and its matching output.
1.8.9
📣 pText release 1.8.9
This release features a few non-essential updates to the pText codebase that are mostly related to testing.
This includes:
- All tests have been refactored to follow the same format, with a small table atop the resulting
PDF
describing the test, when the test was run, etc - All tests (attempt to) follow the same color-scheme (making them look more professional and consistent)
- Tests against the entire corpus have been limited to the essentials, with extensive reporting
⬆️ Performance Boost
There are a few minor tweaks that have boosted the performance of pText
as a whole.
This includes the copy-behaviour of Font
objects in the CanvasGraphicsState
. This has caused a speed-up of nearly 33%.
📄 Fonts
I have also implemented some minor fixes to the whole Font
logic, ensuring font-sizes are now handled properly,
regardless of whether they are passed as an argument to the Tf
operator or via the text-matrix in the CanvasGraphicsState
.
I have also started implementing OCR. But more on that in a future release.
🔒 Redaction
Finally, this release includes everything needed to perform redaction.
This is the process of:
- marking content to be removed (but not removing it, enabling review by a third party)
- removing content that has been marked
This functionality integrates nicely in the existing pText
framework of Page
annotations.
Check the examples for more details (look for "adding redaction annotations to a PDF")
1.8.8
pText 1.8.8
Major overhaul of all font-related functionality.
This release features
- a speedup in copying fonts (often performed when processing pages),
- as well as the ability to use custom (ttf) fonts
Furthermore, all public methods have been documented.
1.8.7
pText 1.8.7
This release features:
- more documentation
- ´setup.py´ and ´requirements.txt´ have changed to ensure pText can easily be installed
- support for embedded files in PDF
1.8.6
pText version 1.8.6
This is a documentation release.
The documentation percentage is now 90%
1.8.3
1.8.3
- Bugfix release
- Improvements to layout algorithm
- Improvements to IO (enabling a Document to be saved multiple times)