-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow embedding CFF fonts #1282
Conversation
For reference, this is reported as prawnpdf/ttfunk#98 |
BaseFont: basename.to_sym, | ||
FontDescriptor: descriptor, | ||
FirstChar: 32, | ||
LastChar: 255, | ||
Widths: @document.ref!(widths), | ||
ToUnicode: cmap | ||
) | ||
|
||
if font.cff.exists? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful/more readable to have this extracted into a separate method and have an OTF subclass that overrides it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think this is correct as-is. It appears Prawn looks at the file extension to determine which Font
subclass to use. That's not wrong necessarily, but nothing prevents fonts with a .ttf file extension from having a CFF table. I've also seen this in practice. Checking for the existence of the table is, IMHO, less error-prone.
On that note, I've also (weirdly) come across fonts with both glyf
and CFF
tables. No idea how that's supposed to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that file extension is used. It's an easy heuristic. Do you think we should change that to make it more sophisticated and closer to "technically correct"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do. OTF and TTF aren't really different formats, they just contain different tables. Prawn would only have to read the directory structure at the beginning of the font to know if it contains a CFF table. Is that something you'd like to see in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also (weirdly) come across fonts with both glyf and CFF tables. No idea how that's supposed to work.
Maybe it's a hybrid font? You can treat it as TTF or OTF and still get the same result? We should look into a few examples to verify if that's the case at least for the majority of cases. If that's so we should probably treat those as OTF fonts.
Is that something you'd like to see in this PR?
Yes, please.
And maybe add a changelog entry?
af299dc
to
09a9840
Compare
Wow, thanks for taking a look at this so quickly @pointlessone! I also have a ttfunk PR that I'm working on submitting that prawn will need to depend on to get everything working. Sadly it turns out I made a few mistakes in my implementation of OpenType support 😓 This PR shouldn't be merged until the corresponding ttfunk PR is reviewed, merged, and released. I will link to it here as soon as it's ready. |
@camertron Random question. By any chance would it be easier to not support subsetting? |
@pointlessone yes, I think it would be much easier, although I haven't tested embedding an entire font file. Most fonts are small and could be embedded without a large increase in file size. However Noto and other fonts that contain very large character sets could add megabytes to file size, so subsetting them is more important. Another consideration is that, while most font authors allow their fonts to be subsetted and embedded in (for example) PDF files, some font licenses restrict you from distributing the entire font program, which is effectively what we'd be doing if we didn't subset in prawn. |
So, to give a bit of context for the question… I as looking into midlle-eastern and indic languages support. It's much more complex than the naive text layout that we currently have. For proper support of those languages we'd have to take into account at the very least GSUB and GPOS tables. They're probably not used by the PDF renderer but TTFunk subsetting is completely not ready for that complexity. Likewise, ligatures, bitmap fonts, variable fonts, the list goes on. So I think instead we're going to deprecate TTFunk subsetting (current iteration) and embed full fonts. That way we'd at least be sure we're not corrupting fonts. Font serialization is not easy. Subsetting is even more complex. Instead let's focus on making TTFunk properly read fonts so that we could properly layout text. |
@pointlessone hmm that's really interesting and would definitely prevent a lot of complexity. Are you concerned with the potential copyright/licensing issues I mentioned in my last comment? For what it's worth, I actually have a branch for subsetting the GPOS/GSUB tables 😅 It would need to be dusted off but a significant amount of the work is already done. The thing is... it's extremely complex. |
BTW, I mean, ideally we should honour all flags, licenses, and user wishes. But I'd rather we embed full fonts and not have bugs than have bugs and subset fonts. If we have to choose and full embedding is easier then let's do it instead. |
A little. Maybe. I don't have any data on how many fonts actually prohibit full embedding. I also don't know what kind of subsetting they require. Maybe we can throw away a random table that is not used by PDF. Like, |
I feel you. Sunk cost in full swing, right? 😅 But you'd probably breath a sigh of relief know you don't have to support all that complexity. I would. 🙂 |
Just my 2c from what I remember: Subsetting an OTF font should not be that different from subsetting a TrueType font in the context of embedding it in a PDF file. The reason for this is that the PDF viewer shouldn't need to do any layout calculations at all. So GSUB/GPOS et al should generally not be needed. I think that this could easily be tested by using a GPOS/GSUB aware application like LibreOffice Writer, write a complex text there with ligatures et al and inspect the resulting PDF file for what has been done, e.g. inspect the content stream to see how horizontal/vertical offsets were done and the embedded OTF font. |
This is definitely a conversation worth having, since I believe all of us agree that subsetting is extremely complex and easy to get wrong. The question we have to answer is whether the benefits of subsetting outweigh the cost of maintaining the subsetting code. As @pointlessone pointed out, I'm not exactly impartial since I contributed the original OTF parsing/subsetting code 😅 , and I'd be lying if I said it doesn't hurt a little to consider discarding a good deal of it. In the end, I'm interested in doing what's right for the project and the maintainers. To that end, there are a few key points I'd like to make. Against subsetting
For subsetting
|
@gettalong yes I think you're right about GSUB/GPOS. That's probably why ignoring them hasn't been a problem for prawn. For other use-cases like the one I mentioned in my last comment, it could be. |
I'd add the most obvious point to the "against subsetting" section: it's much simpler. We're left with a much simpler problem to solve and that's good. A few comments on "for subsetting":
|
Definitely not arguing with you on this point. 100% correct. I didn't include it because we'd already discussed it.
I'm not suggesting it is or is not fair use, I'm simply suggesting it's more morally defensible to include a subset than to include the entire font program anyone could strip out of the PDF and use on their own. Sort of like how the GNU license lets you distribute compiled versions of their source code, even for commercial purposes, but not the source code itself.
That's certainly true for some fonts, but a good deal of the fonts out there can be subset without breaking them in non-PDF contexts.
I don't think we want to be in the business of enforcing copy protection or DRM. As you said, "EULA is external to fonts from TTFunk's point of view." I agree with that. It's just that embedding the entire font program feels immoral to me for some reason.
I disagree. It's useful enough that it worked for subsetting Noto at my previous company, and has been used in prawn for a long time as well.
ttfunk actually supports full unicode subsets internally but doesn't offer an API to create them and instead creates MacRoman subsets for some reason. I assume this is for backwards compatibility. MacRoman is one of the baked-in encodings in the PDF spec, probably because in 1993 Unicode was only ~2 years old. I actually looked into changing ttfunk to create Unicode subsets instead. It should be pretty easy to get working (🤞), but will also require a small change to prawn. CFF also supports the full Unicode codespace. Although CFF charsets and encodings are limited to 255 characters, the font dictionaries feature was intended to support almost an unlimited number characters in a single font. Noto uses this feature, for example.
Yes, that's true, but I don't think it needs to support every font table to be useful. For example, GPOS and GSUB are important for certain scripts and kinds of fonts, but entirely unimportant for others. It seems to work just fine for the fonts I tend to use it for. If at some point I or another developer need it to consider other tables, then we can always add that support later.
Yeah, I've used fonttools quite a bit to validate my work on ttfunk. The code is pretty easy to read. It's not particularly easy to use it from Ruby, however.
Alright, I'm on board with that. As I've said numerous times now, not subsetting does make everything simpler, particularly from prawn's point of view. I still worry about the morality of it, but I'm not the decision maker.
Yeah, that makes a lot of sense. |
That's interesting. I'd love to know more. Do you feel the same way about embedded images, for example? Especially since it's virtually impossible to embed an image in a way that would make it impossible to extract. Since this is a non-technical part of the discussion feel free to either move it to a private channel (e.g. team discussions or email) or not answer at all. |
Using a subset that is not restricted to 255 characters would mean changes to Prawn because simple PDF fonts like Prawn uses are single-byte fonts and can therefore only support 255 characters. If you want to use a single subset font with more than 255 characters in PDF, you would need to use PDF's CID font support (this is actually what I'm doing with HexaPDF). |
No... maybe it's because fonts are essentially pieces of software, and software is what I do. I suppose ultimately it doesn't outweigh the cost of maintaining subsetting code. Also, since a bunch of PDF tools like Distiller, Illustrator, etc, often embed entire fonts, it's probably ok. |
Oh that's true, I forgot how (needlessly) complicated the PDF spec is for CID fonts. |
@camertron I tried it and it doesn't seem to work. I believe From the spec:
I believe it means a naked CFF font, not an OpenType font with a CFF table. I'm almost certain in this because the same table describes a different font subtype:
I used this in #1322 and it seem to work. Thank you for setting me on the right path. That said, I think it's better to close this PR as #1322 among other things fixes this particular issue, too. If you have some time, could you please take a look at that other PR? |
@camertron I took another look at this and the spec and I'm convinced these values are about stand-alone CFF fonts. Since TTFunk doesn't support stand-alone CFF fonts I'll close this PR. Thank you for setting me on the right track. |
Summary
The original work I did to support OpenType fonts in ttfunk assumed embedding them in PDF files would work the same way as it does for TrueType fonts. Sadly this is not the case. I'm sorry to say I haven't been very active around here since OpenType support was released, largely because I haven't been getting notifications when new issues are filed. A few weeks ago I was alerted to weird OpenType font issues on Twitter and started digging in. This PR is a result of that investigation.
What changed?
The PDF docs require OpenType fonts with a CFF table to have a
Subtype
ofType1C
and a reference entry with aSubtype
ofType1
. It also requires the font descriptor to include the font program under theFontFile3
key instead ofFontFile2
as is necessary for non-CFF fonts.