输入:包含嵌入字体的(例如14个)PDF / A-1b文件列表.
处理:与Apache PDFBOX进行简单合并.
结果:1个文件大小(太大)的PDF / A-1b文件. (它几乎是所有源文件大小的总和).
问题:有没有办法减少生成的PDF的文件大小?
想法:删除冗余的嵌入字体.但是怎么样?这是正确的方法吗?
不幸的是,以下代码没有完成这项工作,但突出了明显的问题.
try (PDDocument document = PDDocument.load(new File("E:/tmp/16189_ZU_20181121195111_5544_2008-12-31_Standardauswertung.pdf"))) {
List<COSName> collectedFonts = new ArrayList<>();
PDPageTree pages = document.getDocumentCatalog().getPages();
int pageNr = 0;
for (PDPage page : pages) {
pageNr++;
Iterable<COSName> names = page.getResources().getFontNames();
System.out.println("Page " + pageNr);
for (COSName name : names) {
collectedFonts.add(name);
System.out.print("\t" + name + " - ");
PDFont font = page.getResources().getFont(name);
System.out.println(font + ", embedded: " + font.isEmbedded());
page.getCOSObject().removeItem(COSName.F);
page.getResources().getCOSObject().removeItem(name);
}
}
document.save("E:/tmp/output.pdf");
}
代码产生如下输出:
Page 1
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 2
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 3
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 4
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 5
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 6
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 7
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 8
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 9
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 10
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 11
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 12
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 13
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
Page 14
COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true
COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true
COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true
任何帮助赞赏…
解决方法:
在文件中调试时,我发现相同字体的字体文件被多次引用.因此,使用已查看的字体文件项替换字典中的实际字体文件项,删除了引用并可以进行压缩.通过这种方式,我能够将30 MB的文件缩小到大约6 MB.
File file = new File("test.pdf");
PDDocument doc = PDDocument.load(file);
Map<String, COSBase> fontFileCache = new HashMap<>();
for (int pageNumber = 0; pageNumber < doc.getNumberOfPages(); pageNumber++) {
final PDPage page = doc.getPage(pageNumber);
COSDictionary pageDictionary = (COSDictionary) page.getResources().getCOSObject().getDictionaryObject(COSName.FONT);
for (COSName currentFont : pageDictionary.keySet()) {
COSDictionary fontDictionary = (COSDictionary) pageDictionary.getDictionaryObject(currentFont);
for (COSName actualFont : fontDictionary.keySet()) {
COSBase actualFontDictionaryObject = fontDictionary.getDictionaryObject(actualFont);
if (actualFontDictionaryObject instanceof COSDictionary) {
COSDictionary fontFile = (COSDictionary) actualFontDictionaryObject;
if (fontFile.getItem(COSName.FONT_NAME) instanceof COSName) {
COSName fontName = (COSName) fontFile.getItem(COSName.FONT_NAME);
fontFileCache.computeIfAbsent(fontName.getName(), key -> fontFile.getItem(COSName.FONT_FILE2));
fontFile.setItem(COSName.FONT_FILE2, fontFileCache.get(fontName.getName()));
}
}
}
}
}
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
doc.save(baos);
final File compressed = new File("test_compressed.pdf");
baos.writeTo(new FileOutputStream(compressed));
也许这不是最优雅的方式,但它可以工作并保持PDF / A-1b的兼容性.