Skip to content

Commit

Permalink
font refactoring
Browse files Browse the repository at this point in the history
  • Loading branch information
jorisschellekens committed May 15, 2021
1 parent 9387bc7 commit 2416179
Show file tree
Hide file tree
Showing 413 changed files with 414,883 additions and 7,526 deletions.
130 changes: 106 additions & 24 deletions EXAMPLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,23 @@ Getting started with `pText` is easy.

`pip install ptext-joris-schellekens`

If you have installed `pText` before, and you want to ensure you're installing the latest version, execute the following commands

`pip uninstall ptext-joris-schellekens`
`pip install --no-cache ptext-joris-schellekens`

4. Done :tada: You are all ready to go.
Try out some of the examples to get to know `pText`.

**Note**: if you have used `pText` in the past, it's best to ensure that pip is not serving
you a version of `pText` from its cache. Uninstall your previous version using:

`pip uninstall ptext-joris-schellekens`

and install the latest version using:

`pip install --no-cache ptext-joris-schellekens`

### 0.2 About AGPLv3

The AGPL license differs from the other GNU licenses in that it was built for network software.
Expand Down Expand Up @@ -359,13 +373,43 @@ Once the `Document` is parsed, we can extract all `Image` objects from the `Simp
with open(output_file, "wb") as image_file_handle:
img.save(image_file_handle)

### 1.7 Annotations
### 1.7 Embedded Files

If a PDF file contains file specifications that refer to an external file and the PDF file is archived or transmitted,
some provision should be made to ensure that the external references will remain valid. One way to do this is to
arrange for copies of the external files to accompany the PDF file. Embedded file streams (PDF 1.3) address
this problem by allowing the contents of referenced files to be embedded directly within the body of the PDF
file. This makes the PDF file a self-contained unit that can be stored or transmitted as a single entity. (The
embedded files are included purely for convenience and need not be directly processed by any conforming
reader.)

#### 1.7.1 Listing all embedded files in a PDF

doc = None
with open("input.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle)

# extract all embedded files
# this outputs a dictionary of str -> bytes
embedded_files = doc.get_embedded_files()

#### 1.7.2 Extracting an embedded file from a PDF

doc = None
with open("input.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle)

# extract an embedded file named "file_001.txt"
# assuming it exists
embedded_files = doc.get_embedded_file("file_001.txt")

### 1.8 Annotations

An annotation associates an object such as a note, sound, or movie with a location on a page of a PDF
document, or provides a way to interact with the user by means of the mouse and keyboard. PDF includes a
wide variety of standard annotation types, described in detail in 12.5.6, “Annotation Types.”

#### 1.7.1 Adding a rubber stamp annotation to an existing PDF
#### 1.8.1 Adding a rubber stamp annotation to an existing PDF

A rubber stamp annotation (PDF 1.3) displays text or graphics intended to look as if they were stamped on the
page with a rubber stamp. When opened, it shall display a pop-up window containing the text of the associated
Expand Down Expand Up @@ -407,7 +451,7 @@ The result should be something like this (keep in mind the rendering of the rubb

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.2 Adding all possible rubber stamp annotations to an existing PDF
#### 1.8.2 Adding all possible rubber stamp annotations to an existing PDF

A rubber stamp annotation (PDF 1.3) displays text or graphics intended to look as if they were stamped on the
page with a rubber stamp. When opened, it shall display a pop-up window containing the text of the associated
Expand Down Expand Up @@ -454,7 +498,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.3 Adding a circle annotation to an existing PDF
#### 1.8.3 Adding a circle annotation to an existing PDF

We start by reading the PDF:

Expand Down Expand Up @@ -482,7 +526,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.4 Adding a square annotation to an existing PDF
#### 1.8.4 Adding a square annotation to an existing PDF

We start by reading the PDF:

Expand Down Expand Up @@ -510,7 +554,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.5 Adding a polygon annotation to an existing PDF
#### 1.8.5 Adding a polygon annotation to an existing PDF

We start by reading the PDF:

Expand Down Expand Up @@ -540,7 +584,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.6 Adding a polyline annotation to an existing PDF
#### 1.8.6 Adding a polyline annotation to an existing PDF

We start by reading the PDF:

Expand Down Expand Up @@ -570,7 +614,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.7 Adding an annotation using a shape from the `LineArtFactory` to an existing PDF
#### 1.8.7 Adding an annotation using a shape from the `LineArtFactory` to an existing PDF

The `LineArtFactory` class allows you to easily create shapes (defined as `List[Tuple[Decimal,Decimal]]` ), it contains everything you need to render:
- triangles (right sided triangle, isoceles triangles)
Expand Down Expand Up @@ -611,7 +655,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.8 Adding a highlight annotation to an existing PDF
#### 1.8.8 Adding a highlight annotation to an existing PDF

We start by reading the PDF:

Expand Down Expand Up @@ -641,7 +685,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.9 Adding a link annotation to an existing PDF
#### 1.8.9 Adding a link annotation to an existing PDF

We start by reading the PDF:

Expand Down Expand Up @@ -673,7 +717,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.10 Adding a text annotation to an existing PDF
#### 1.8.10 Adding a text annotation to an existing PDF

We start by reading the PDF:

Expand All @@ -699,7 +743,7 @@ Finally, we need to store the resulting PDF `Document`.

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.11 Adding a square annotation around a regular expression match to an existing PDF
#### 1.8.11 Adding a square annotation around a regular expression match to an existing PDF

Let's combine what we saw earlier,
about finding the coordinates of a regular expression with our new understanding of annotations.
Expand Down Expand Up @@ -738,7 +782,7 @@ The end result (at least the annotations) should look something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.12 Adding a square annotation in the free space of a page to an existing PDF
#### 1.8.12 Adding a square annotation in the free space of a page to an existing PDF

Sometimes the position of the annotation does not matter that much,
as long as it does not block any other visible content.
Expand Down Expand Up @@ -796,7 +840,7 @@ Notice how our use of `FreeSpaceFinder` meant that the annotation did not collid

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.13 Getting all annotations from a PDF
#### 1.8.13 Getting all annotations from a PDF

Getting all annotations from a PDF is easy, if you know where to look.
Let's start by opening the PDF `Document`:
Expand All @@ -813,7 +857,7 @@ Let's check the first `Page`.

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.7.14 Showcase : Adding a collection of annotations shaped like super mario to an existing PDF
#### 1.8.14 Showcase : Adding a collection of annotations shaped like super mario to an existing PDF

From the spec:

Expand Down Expand Up @@ -907,9 +951,9 @@ The result should be something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

### 1.8 Exporting a PDF
### 1.9 Exporting a PDF

#### 1.8.1 Exporting a PDF as JSON
#### 1.9.1 Exporting a PDF as JSON

This scenario is particularly useful when debugging. It enables you to see the PDF `Document` in the same way `pText` sees it.

Expand All @@ -925,7 +969,7 @@ which will give you access to a `json` like structure.
# export to json
with open("output.json", "w") as json_file_handle:
json_file_handle.write(
json.dumps(doc.to_json_serializable(doc), indent=4)
json.dumps(doc.to_json_serializable(), indent=4)
)
On my example input document, this yielded the following output:
Expand Down Expand Up @@ -959,7 +1003,7 @@ On my example input document, this yielded the following output:
Here we can clearly see the xref table being persisted.
This table acts as the starting point of the document, it contains references to other data-structures that contain meta-information, information about each page, etc.

#### 1.8.2 Exporting a PDF as SVG
#### 1.9.2 Exporting a PDF as SVG

Sometimes, all you need is an image. With `pText` you can easily convert any `Page` of a `Document` into an SVG image.

Expand All @@ -986,7 +1030,7 @@ This was the input document:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

#### 1.8.2 Exporting a PDF as MP3
#### 1.9.2 Exporting a PDF as MP3

For those with hearing-impairments, it can be very useful to be able to convert a PDF `Document` to an MP3 file.
This is perfectly possible with `pText`.
Expand All @@ -1006,12 +1050,12 @@ The constructor of `PDFToMP3` has some arguments that allow us to tweak the expo
- `language` : This is the 2-letter abbreviation of the language you expect the text to be in. Default is `en`
- `slow`: This indicates whether you want the speaking-voice to go (extra) slow, or not

### 1.9 Concatenating PDFs, and other page-manipulations
### 1.10 Concatenating PDFs, and other page-manipulations

A common scenario, when working with existing PDF `Document` objects is concatenation.
Let's look at how you can concatenate two or more existing `Document` objects:

#### 1.9.1 Concatenating entire PDF `Documents`
#### 1.10.1 Concatenating entire PDF `Documents`

# attempt to read PDF
doc_a = None
Expand All @@ -1036,7 +1080,7 @@ And finally store the merged PDF:
with open("output.pdf", "wb") as out_file_handle:
PDF.dumps(out_file_handle, doc_c)

#### 1.9.2 Concatenating parts of a `Document`
#### 1.10.2 Concatenating parts of a `Document`

# attempt to read PDF
doc_a = None
Expand Down Expand Up @@ -1064,7 +1108,7 @@ And finally we can store the merged PDF:
with open("output.pdf", "wb") as out_file_handle:
PDF.dumps(out_file_handle, doc_c)

#### 1.9.3 Removing a `Page` from a `Document`
#### 1.10.3 Removing a `Page` from a `Document`

First, we open the `Document`

Expand Down Expand Up @@ -1554,6 +1598,44 @@ The result should be something like this:

Check out the `tests` directory to find more tests like this one, and discover what you can do with `pText`.

##### 2.1.3.7 Setting the Font

`Font` objects in `pText` are as close to the PDF level as possible, whilst still keeping it user-friendly.
In the PDF spec, there are 2 kinds of `Fonts`. `SimpleFont` (in general) represents a `Font` that maps only the bytes `0..255` to unicode characters.

Among these `SimpleFonts` are the so called 'standard 14 fonts'. These are fonts that any conforming reader should have available to it.
By using one of the standard 14, you are ensuring your `Document` will look the same, regardless of the viewing software.

`CompositeFont` on the other hand represents a more generic `Font` that can map an arbitrary byte (range) to an arbitrary collection of unicode characters.

###### 2.1.3.7.1 Using one of the standard 14

font: Font = StandardType1Font('Helvetica')

The names of the standard 14 are:

- "Courier",
- "Courier-Bold",
- "Courier-Bold-Oblique",
- "Courier-Oblique",
- "Helvetica",
- "Helvetica-Bold",
- "Helvetica-Bold-Oblique",
- "Helvetica-Oblique",
- "Symbol",
- "Times-Bold",
- "Times-Bold-Italic",
- "Times-Italic",
- "Times-Roman",
- "ZapfDingbats",

###### 2.1.3.7.2 Using a TrueTypeFont

Alternatively, you can specify a `Font` by providing a path to its `.ttf` file.

font: Font = TrueTypeFont.true_type_font_from_file(Path("/home/user/Pacifico.ttf"))


#### 2.1.4 Adding text to a `Document` using `Heading`

A `Heading` acts like any other `Paragraph` object, at least visually it does.
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Corpus Coverage : 98.2%](https://img.shields.io/badge/corpus%20coverage-98.2%25-green)]()
[![Text Extraction : 70.7%](https://img.shields.io/badge/text%20extraction-70.7%25-orange)]()
[![Public Method Documentation : 93.0%](https://img.shields.io/badge/public%20method%20documentation-93.0%25-green)]()
[![Public Method Documentation : 100%](https://img.shields.io/badge/public%20method%20documentation-100%25-green)]()


pText is a library for creating and manipulating PDF files in python.
Expand Down Expand Up @@ -50,4 +50,5 @@ Contact sales for more info.
## 3. Acknowledgements

I would like to thank the following people, for their contributions / advice with regards to developing `pText`:
- Michael Klink
- Benoît Lagae
- Michael Klink
Loading

0 comments on commit 2416179

Please sign in to comment.