foliojs / fontkit

An advanced font engine for Node and the browser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

full unicode display and copy/paste support

mintty opened this issue · comments

As an attempt to fix foliojs/pdfkit#1251, I came up with the test program below.
It produces PDF output which looks like the second section below.
Selecting all text in the PDF and copy/paste into a text file yields the result in the third section below.
Problems are:

  • The program needs to care about switching font according to different glyph coverage. I'd hope for some automatic font choice/fallback mechanism to cover all characters as needed.
  • Missing glyphs are displayed as a hollow box replacement symbol ▯ due to the problem before. However, even if they are selected and transferred with copy/paste, they are not reproduced as intended but all appear as U+100000. Transparent copy/paste round-trip should be accomplished, whether the glyph can be displayed or not.
  • Note the "Hællœ" pasted as "Hælloe": the œ ligature is pasted as two separate characters oe, unlike the æ ligature. This is when I ran the program on Windows. Same program on Linux pastes œ back correctly; both with pdfkit 0.12.1.

const PDFDocument = require('pdfkit')
const fs = require('fs')

let doc = new PDFDocument
doc.pipe(fs.createWriteStream('pdfkit.pdf'))
doc.registerFont('normal', './NotoSans-Regular.ttf')
doc.registerFont('emojis', './NotoEmoji-Regular.ttf')
// this one does not work:
doc.registerFont('NotoColorEmoji', './NotoColorEmoji_WindowsCompatible.ttf')

doc.font('normal')
doc.text('Hællœ 1€')
doc.text('Greek, Cyrillic: αγΩЭ')
doc.text('CJK: 啕')
doc.text('4 BMP emojis:')

doc.font('emojis')
doc.text('⛔⛱⛲✅')

doc.font('normal')
doc.text('5 non-BMP characters:')
doc.text('𐌸𐐀𑁍𝄞𝔸')
doc.text('3 non-BMP emojis:')

doc.font('emojis')
doc.text('🌛🍅😀')

doc.end()


Hællœ 1€
Greek, Cyrillic: αγΩЭ
CJK: ▯
4 BMP emojis:
⛔▯⛲✅
5 non-BMP characters:
▯▯▯▯▯
3 non-BMP emojis:
🌛🍅😀


Hælloe 1€
Greek, Cyrillic: αγΩЭ
CJK: 􀀀
4 BMP emojis:
⛔􀀀⛲✅
5 non-BMP characters:
􀀀􀀀􀀀􀀀􀀀
3 non-BMP emojis:
🌛🍅😀

* The program needs to care about switching font according to different glyph coverage. I'd hope for some automatic font choice/fallback mechanism to cover all characters as needed.

There's a feature request: foliojs/pdfkit#201

commented

The "Hællœ" pasted as "Hælloe" issue seems to depend on the PDF viewer, so forget about this one here.
However, transparent pasting of all characters, whether displayable or not, is essential for certain applications.

This sounds like a pdfkit problem not a fontkit one

The fontkit issue was closed, so we're back here...
Problem: Glyphs not available in the font are neither displayed (▯) nor can they be copied and pasted back transparently (which is however an important feature in certain applications).
The generated PDF contains the following in the affected cases:

[<000d00160017001100050000> 0] TJ
The last character output is 0000

1 beginbfrange
<0000> <0006> [<0000> <26d4> <26f2> <2705> <d83c df1b> <d83c df45> <d83d de00>]
endbfrange

The 0000 is mapped to <0000> for copy/paste.

commented

Okay but that's still a pdfkit issue, no? Fontkit has nothing to do with whether or not you can copy text, or how it's presented. It just shapes unicode sequence. If the error is "fontkit isn't rendering .notdef for unknown glyphs" then that's a good issue for here, but otherwise this has nothing to do with fontkit itself?

Actually my comment should have gone to the pdfkit issue, sorry. Fixed that.