Fixing a Complicated PDF Bug
UPDATE: I have a new and improved solution described here.
I am generating PDF files using the Linux Libertine fonts and XeTeX. When I view the files with an ordinary PDF reader, they appear fine. However, when I open them with the PDF.js viewer built into Firefox, the ligatures appear as odd foreign characters.
This problem appears to be known, as it is discussed in this bug report. However, there does not appear to be much progress in that thread as to solving it. I haven’t exactly pinned down the problem, but I did at least find a workaround.
The bug is incredibly specific (though, thankfully, easily reproducible). It only occurs when I compile a document on OS X and view the document in the PDF.js viewer on OS X and Firefox. The bug does not show up in any of these situations as I’ve tested:
- Compiling the document on Linux
- Viewing the document in Firefox on Windows
- Viewing the document on PDF.js in Safari on a Mac
As an example: the “Th” ligature is Unicode code point 0xe049. However, in the Linux Libertine Roman font, it is the 0x095f’th glyph listed, not counting blank slots. Unicode character 0x095f is Devanagari character Yya: य़. And that is the wrongly displayed character shown on the bug report in the place of the “Th” of the word “The.”
Strangely, though, the problem appears only to affect ligatures. The Linux Libertine character at Unicode 0x0e42 is the 0x0958’th glyph, only a few slots away from the problematic Th ligature. There is also a Devanagari character at that position, but PDF.js displays the Linux Libertine character fine.
After some testing, I discovered that simply changing the glyph names for the ligatures would solve the problem. The ligatures in Linux Libertine are named with underscores between letters (e.g., f_f_i or T_h). Merely deleting the underscores corrected the problem entirely. It’s not clear why that is so, but I have noticed that OS X seems to have some special cases for handling ligature characters, and perhaps that is related.
The following Perl script will automatically change the names of glyphs in the fonts to be correct:
The program requires the ttx command line program to operate.#!/usr/bin/perl -wMake directories
mkdir “old-ttx” or die “mkdir old-ttx: $!"; mkdir “new-ttx” or die “mkdir new-ttx: $!"; mkdir “new-otf” or die “mkdir new-otf: $!";
Convert fonts to ttx
system “ttx”, “-d”, “old-ttx”, @ARGV;
Fix each file
for my $old_ttx (<old-ttx/*.ttx>) { my $new_ttx = $old_ttx; $new_ttx =~ s/^old/new/; print “Fixing $old_ttx\n”; fixfile($old_ttx, $new_ttx); }
system “ttx”, “-d”, “new-otf”, glob(“new-ttx/*");
sub fixfile { my ($oldttx, $newttx) = @_;
open OLD, $oldttx or die "open $newttx: $!"; my %ligatures = (); while (<OLD>) { if (/^ *<Ligature .* glyph="([^"]*_[^"]*)"\/>$/) { my $lig = $1; my $modlig = $lig; $modlig =~ s/_//g; $ligatures{$lig} = $modlig; } } my @ligatures = sort { length($b) <=> length($a) } keys %ligatures; seek OLD, 0, 0; open NEW, '>', $newttx or die "open $newttx: $!"; while (<OLD>) { for my $lig (@ligatures) { my $modlig = $ligatures{$lig}; s/\b$lig\b/$modlig/g; } print NEW $_; }
}
To run, paste the above contents into a file, make it executable, and run it with the arguments being all the OTF files for the Linux Libertine fonts. The program will create three new directories for you; the one called “new-otf” is the one of interest. That folder will contain the new, corrected font files.
I hope that someone actually determines the source of this bug, rather than relying on this admittedly hackish solution.