跳至內容

PDF、PS 與 DjVu

出自 Arch Linux 中文维基

本文涵蓋用於查看、編輯和轉換 PDFPostScript(PS)、DjVudéjà vu)與 XPS文件的軟體。

引擎[編輯 | 編輯原始碼]

  • DjVuLibre — 該套件用於創建、操作和查看 DjVu 文檔。
https://djvu.sourceforge.net/ || djvulibre
  • Ghostscript — PostScript 和 PDF 的解釋器。提供 gs(1) 命令行界面,另請參閱 /usr/share/doc/ghostscript/*/Use.htm在線閱讀),以及許多封裝腳本,如 ps2pdfpdf2ps
https://ghostscript.com/ || ghostscript
  • libgxps — 基於 GObject 的庫,用於處理和渲染 XPS 文檔。
https://wiki.gnome.org/Projects/libgxps || libgxps
  • libspectre — 用於渲染 Postscript 文檔的小型庫。
https://www.freedesktop.org/wiki/Software/libspectre || libspectre
  • Mupdf — MuPDF 是一款輕量級 PDF、XPS 和 EPUB 閱讀器,由軟體庫、命令行工具和閱讀器組成。
https://mupdf.com/ || libmupdf
  • Poppler — 基於 Xpdf 的 PDF 渲染庫。要使 Poppler 支持中日韓(中文、日文、韓文)語言,請安裝 poppler-data
https://poppler.freedesktop.org/ || poppler

查看器[編輯 | 編輯原始碼]

幀緩衝區[編輯 | 編輯原始碼]

  • fbgs — 用於 linux 幀緩衝控制台的勉強可用的 PostScript/pdf 查看器。
https://www.kraxel.org/blog/linux/fbida/ || fbida
  • fbpdf — 基於 MuPDF 的小型幀緩衝 PDF 與 DjVu 查看器,帶有 Vim 鍵綁定,用 C 語言編寫。
https://repo.or.cz/w/fbpdf.git || fbpdf-gitAUR
  • jfbview — 幀緩衝 PDF 和圖像瀏覽器。其功能包括類似 Vim 的控制項、縮放至合適、TOC(大綱)視圖和快速多線程渲染。
https://github.com/jichu4n/jfbview || jfbviewAUR

圖形化[編輯 | 編輯原始碼]

注意:某些網絡瀏覽器可以顯示 PDF 文件,例如使用 PDF.js
  • apvlv — 輕量級文檔查看器,使用 GTK 庫與 Vim 鍵綁定。支持 PDF、DjVu、EPUB、HTML 和 TXT。
https://naihe2010.github.io/apvlv/ || apvlvAUR
  • Atril — 適用於 MATE 的簡單多頁文檔查看器。支持 DjVu、DVI、EPS、EPUB、PDF、PostScript、TIFF、XPS 和 Comicbook。
https://github.com/mate-desktop/atril || atril
  • CorePDF — 基於 Qt 和 poppler 的簡單輕量級 PDF 查看器。是 C-Suite 的一部分。
https://cubocore.gitlab.io/ || corepdfAUR
  • Deepin Document Viewer — A一款簡單的 PDF 和 DjVu 閱讀器,支持書籤、高亮顯示和注釋。
https://github.com/linuxdeepin/deepin-reader || deepin-reader
  • DjView — DjVu 文檔查看器
https://djvu.sourceforge.net/djview4.html || djview
https://www.gnu.org/software/emacs/ || emacs
  • ePDFView — 使用 Poppler 和 GTK 庫的輕量級 PDF 文檔查看器。已停止開發。
http://freecode.com/projects/epdfview || epdfview-gitAUR
https://www.foxitsoftware.com/pdf-reader/ || foxitreaderAUR
  • GNOME Document Viewer — 使用 GTK 的 GNOME 文檔查看器。支持 DjVu、DVI、EPS、PDF、PostScript、TIFF、XPS 和 Comicbook。是 gnome包組 的一部分。
https://apps.gnome.org/Evince/ || evince
  • gv — Ghostscript 解釋器的圖形用戶界面,允許查看和瀏覽 PostScript 和 PDF 文檔。
https://www.gnu.org/software/gv/ || gvAUR
  • llpp — 基於 MuPDF 的快速 PDF 閱讀器,支持連續滾動頁面、書籤和全文搜索。
https://repo.or.cz/w/llpp.git || llppAUR
  • MuPDF — 使用可攜式 C 語言編寫的快速 EPUB、FictionBook、PDF、XPS 和 Comicbook 查看器。支持中日韓字體並具有類似 vim 的綁定功能。
https://mupdf.com/ || mupdf
  • Okular — KDE 的通用文檔查看器。支持 CHM、Comicbook、DjVu、DVI、EPUB、FictionBook、Mobipocket、ODT、PDF、Plucker、PostScript、TIFF 和 XPS。是 kde-graphics包組 的一部分。
https://okular.kde.org/ || okular
  • Papers — 使用 GTK 的 GNOME 文檔查看器。支持 DjVu、PDF、TIFF 與 Comicbook。
https://apps.gnome.org/Papers/ || papers
  • pdfpc — Presenter console with multi-monitor support for PDF files.
https://pdfpc.github.io/ || pdfpc
  • qpdfview — 標籤式文檔查看器。它使用 Poppler 支持 PDF,使用 libspectre 支持 PS,使用 DjVuLibre 支持 DjVu,使用 CUPS 支持列印,並使用 Qt 工具包製作界面。
https://launchpad.net/qpdfview || qpdfviewAUR
  • Sioyek — 基於 MuPDF 的輕量級 PDF 閱讀器,具有專為閱讀研究論文和技術書籍而設計的功能,如標記、書籤、高亮顯示、可搜索命令調色板、跳轉到參考文獻等。
https://sioyek.info/ || sioyekAUR
  • Xpdf — 可解碼 LZW 和讀取加密 PDF 的閱讀器。
https://www.xpdfreader.com/ || xpdf
  • Xreader — X-Apps 項目的文檔查看器。支持 DjVu、DVI、EPUB、PDF、PostScript、TIFF、XPS 和 Comicbook。
https://github.com/linuxmint/xreader/ || xreader
  • Zathura — 高度可定製、功能強大的文檔查看器(基於插件)。支持 PDF、DjVu、PostScript 和 Comicbook。
https://pwmt.org/projects/zathura/ || zathura

比較[編輯 | 編輯原始碼]

本文或本章節的事實準確性存在爭議。

原因: 在 MuPDF 和 llpp 中填寫 PDF 表單的功能似乎是不可用的。(在 Talk:PDF、PS 與 DjVu 中討論)


名稱 PDF PostScript DjVu XPS PDF 表格 PDF 注釋 非矩形選擇 許可證
Adobe Reader 定製的 專有
apvlv Poppler DjVuLibre 否 (至少沒有默認) GPLv2
Atril Poppler libspectre DjVuLibre libgxps GPLv2
DjView DjVuLibre GPLv2
Emacs Ghostscript1 DjVuLibre1 GPLv3
Emacs pdf-tools Poppler GPLv3
ePDFView Poppler GPLv2
Foxit Reader 定製的 專有
GNOME Document Viewer Poppler libspectre DjVuLibre libgxps GPLv2
gv Ghostscript GPLv3
llpp libmupdf libmupdf GPLv3
MuPDF 定製的 Custom 是 (mupdf-gl) 是 (mupdf-gl) 是 (mupdf-gl) AGPLv3
Okular Poppler libspectre DjVuLibre 定製的 GPL、LGPL
PDF4QT 定製的 LGPLv3
pdfpc Poppler GPLv2
qpdfview Poppler libspectre1 DjVuLibre1 GPLv2
Xpdf 定製的 GPLv3
Xreader Poppler libspectre1 DjVuLibre1 libgxps1 GPLv2
Zathura libmupdf1 / Poppler1 libspectre1 DjVuLibre1 libmupdf1 zlib
  1. 需要安裝可選依賴項

PDF forms[編輯 | 編輯原始碼]

The PDF forms column in the above table refers to AcroForms support. If you do not need your input to be directly extractable from the PDF, you can also use the applications in #Graphical PDF editing to put text on top of a PDF. PDF forms can be created with LibreOffice Writer (View > Toolbars > Form Controls) and the advanced PDF editors.

The proprietary and deprecated XFA format for forms is not fully supported by Poppler[1][2] and only supported by Adobe Reader and Master PDF Editor.

Alternatively, web browsers such as Firefox or Chromium feature a built-in PDF viewer capable of filling out forms.

Graphical PDF editing[編輯 | 編輯原始碼]

Editors that can import PDF files[編輯 | 編輯原始碼]

  • Scribus can import and export PDF; text is imported as polygons.[3]
  • LibreOffice Draw can import and export PDF; text is imported as text; embedded fonts are substituted.[4][5]
  • Inkscape can import and export PDF; text is imported as cloned glyphs or text; with the latter embedded fonts are substituted.
  • Graphics editors like GIMP and krita can also import and export PDFs at the cost of rasterization.

Basic editors[編輯 | 編輯原始碼]

  • flpsed — A PostScript and PDF annotator, only supports text boxes.
https://flpsed.org/flpsed.html || flpsedAUR
  • HandyOutliner for DjVu / PDF — Make easier and faster the process of creating bookmarks for DjVu and PDF documents.
https://handyoutlinerfo.sourceforge.net || handyoutliner-binAUR
  • jPDF Tweak — Java Swing application that can combine, split, rotate, reorder, watermark, encrypt, sign, and otherwise tweak PDF files.
https://jpdftweak.sourceforge.net/ || jpdftweakAUR
  • Paper Clip — PDF document metadata editor to edit the title, author, keywords and more details.
https://apps.gnome.org/PdfMetadataEditor/ || paper-clip
  • PDF Arranger — Helps merge or split pdf documents and rotate, crop and rearrange pages. It is a maintained fork of PDF-Shuffler.
https://github.com/jeromerobert/pdfarranger || pdfarranger
  • PDF Chain — GTK front-end for PDFtk, written in C++, supporting concatenation, burst, watermarks, attaching files and more.
https://pdfchain.sourceforge.net/ || pdfchainAUR
  • PdfJumbler — Simple tool to rearrange, merge, delete and rotate pages in PDF files.
https://github.com/mgropp/pdfjumbler || pdfjumblerAUR
  • PDF Mix Tool — Qt front-end for PoDoFo, written in C++, supports splitting, merging, rotating and mixing PDF files.
https://scarpetta.eu/pdfmixtool/ || pdfmixtool
  • PDFsam — Open source application, written in Java, supports merging, splitting and rotating.
https://pdfsam.org/ || pdfsamAUR
  • PDF Slicer — Simple application to extract, merge, rotate and reorder pages of PDF documents.
https://junrrein.github.io/pdfslicer/ || pdfslicer
  • PDF Tricks — Simple, efficient application for small manipulations in PDF files using Ghostscript.
https://github.com/muriloventuroso/pdftricks || pdftricks

Cropping tools[編輯 | 編輯原始碼]

  • briss — Java GUI to crop pages of PDF documents to one or more regions selected.
https://sourceforge.net/projects/briss/ || brissAUR
  • krop — Simple graphical tool to crop the pages of PDF files.
https://arminstraub.com/software/krop || kropAUR
  • pdfCropMargins — Automatically crops the margins of PDF files.
https://github.com/abarker/pdfCropMargins || pdfcropmarginsAUR
  • PdfHandoutCrop — Tool to crop pdf handout with multiple pages per sheet.
https://cges30901.github.io/pdfhandoutcrop/ || pdfhandoutcropAUR

Advanced editors[編輯 | 編輯原始碼]

  • Master PDF Editor — Functional proprietary PDF editor. Latest version free for non-commercial use. The -free package is outdated but lacks a watermark.
https://code-industry.net/free-pdf-editor/ || masterpdfeditorAUR, masterpdfeditor-freeAUR
  • PDF Studio — All-in-one proprietary PDF editor similar to Adobe Acrobat.
https://www.qoppa.com/pdfstudio/ || pdfstudio-binAUR
  • PDF4QT — Open source PDF editor.
https://jakubmelka.github.io/ || pdf4qtAUR

Comparison of advanced editors[編輯 | 編輯原始碼]

Name Cost (USD, lifetime) Page Labels Form Designer Content Editing (Text and Images) Optimize PDFs Digitally Sign PDFs License
Master PDF Editor 85.34 proprietary
Qoppa PDF Studio Standard 99 proprietary
Qoppa PDF Studio Pro 139 proprietary

PDF 工具[編輯 | 編輯原始碼]

參見 Ghostscript

  • Camelot — Camelot: 為人類提取 PDF 表格。
https://github.com/atlanhq/camelot || python-camelotAUR, python-camelot-gitAUR
  • Coherent PDF — 專有的非自由命令行工具,用於處理 PDF 文件,包括合併、加密、解密、縮放、裁剪、旋轉、書籤、印章、徽標和頁碼。
https://community.coherentpdf.com/ || cpdfAUR
  • DiffPDF — 比較兩個 PDF 文件中每一頁的文本或視覺外觀。
https://gitlab.com/eang/diffpdf || diffpdf
  • mupdf-tools — 作為 MuPDF 的一部分而開發的工具,包含 mutool(1)muraster
https://mupdf.com || mupdf-tools
  • pdfcpu — 用於創建和修改 PDF 的命令行工具。
https://github.com/pdfcpu/pdfcpu || pdfcpu-binAUR
  • pdf_extbook — 提取已添加書籤的 PDF 頁面
https://github.com/raffaem/pdf_extbook || pdf_extbook-gitAUR
  • pdfgrep — 命令行實用程序,用於搜索 PDF 文件中的文本。
https://pdfgrep.org/ || pdfgrep
  • pdfjam — 可用於將 PDF 文件放大、連接、旋轉和翻轉,並將其排列成適合書籍裝幀的格式。
https://github.com/DavidFirth/pdfjam || texlive-binextra
  • pdfminer.six — 由社區維護的 PDF 文檔文本提取工具 pdfminer 的分叉版。
https://github.com/pdfminer/pdfminer.six || python-pdfminer
  • pdf2svg — 將 PDF 文件轉換為 SVG 文件。
http://www.cityinthesky.co.uk/opensource/pdf2svg/ || pdf2svg
  • PDFtk — 用於處理 PDF 文檔日常事務的簡易工具。
https://gitlab.com/pdftk-java/pdftk || pdftk
  • QPDF — 內容保護型 PDF 轉換系統
https://github.com/qpdf/qpdf || qpdf
  • Stapler — 使用 PyPDF2 庫的 PDFtk 輕型替代程序。
https://github.com/hellerbarde/stapler || staplerAUR, stapler-gitAUR
  • Tabula — Tabula 是一款用於釋放被困在 PDF 文件中的數據表的工具。
https://tabula.technology || tabulaAUR, tabula-javaAUR
  • Vector Slicer — 從 SVG 導出多頁 PDF。
https://gitlab.gnome.org/World/design/vector-slicer || vector-slicer
  • verapdf — 專用的開放原始碼文件格式驗證器,涵蓋所有 PDF/A 和 PDF/UA 部分和一致性級別。
https://verapdf.org || verapdfAUR

Command snippets[編輯 | 編輯原始碼]

Create a PDF from images[編輯 | 編輯原始碼]

With GraphicsMagick:

$ gm convert 1.jpg 2.jpg 3.jpg out.pdf

With ImageMagick:

$ magick 1.jpg 2.jpg 3.jpg out.pdf

Note that ImageMagick's output is lossy. For lossless PDF creation from jpeg, use img2pdf.

Concatenate PDFs[編輯 | 編輯原始碼]

With Ghostscript:

$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=out.pdf -dBATCH 1.pdf 2.pdf 3.pdf

With PDFtk:

$ pdftk 1.pdf 2.pdf 3.pdf cat output out.pdf

With Poppler:

$ pdfunite 1.pdf 2.pdf 3.pdf out.pdf

With QPDF:

$ qpdf --empty --pages 1.pdf 2.pdf 3.pdf -- out.pdf

Extract text from PDF[編輯 | 編輯原始碼]

With Poppler and maintaining the layout:

$ pdftotext -layout in.pdf out.txt

See also pdftotext(1).

With calibre:

$ ebook-convert in.pdf out.txt

Results vary between applications, depending on the PDF file.

Decrypt a PDF[編輯 | 編輯原始碼]

This section lists commands to decrypt a PDF to an unencrypted file. Note that most PDF viewers also support encrypted PDFs.

With PDFtk:

$ pdftk in.pdf input_pw password output out.pdf

With Poppler to PostScript:

$ pdftops -upw password in.pdf out.ps

With QPDF:

$ qpdf --decrypt --password=password in.pdf out.pdf
提示:Forgotten passwords might be recovered with pdfcrack, see pdfcrack(1).

Encrypt a PDF[編輯 | 編輯原始碼]

The user password is used for encryption, the owner password to restrict operations once the document is decrypted, for more information, see Wikipedia:PDF#Encryption and signatures.

With PDFtk:

$ pdftk in.pdf output out.pdf user_pw password

With PoDoFo:

$ podofoencrypt -u user_password -o owner_password in.pdf out.pdf

With QPDF:

$ qpdf --encrypt user_password owner_password key_length -- in.pdf out.pdf

where key_length can be 40, 128 or 256.

Extract images from a PDF[編輯 | 編輯原始碼]

With poppler, saving images as JPEG:

$ pdfimages infile.pdf -j outfileroot

Extract page range from PDF, split multipage PDF document[編輯 | 編輯原始碼]

With Ghostscript as a single file[6]

$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=first -dLastPage=last -sOutputFile=outfile.pdf infile.pdf

With PDFtk as a single file:

$ pdftk infile.pdf cat first-last output outfile.pdf

With Poppler as separate files:

$ pdfseparate -f first -l last infile.pdf outfileroot-%d.pdf

With QPDF as a single file:

$ qpdf --empty --pages infile.pdf first-last -- outfile.pdf

With mutool as a single file:

$ mutool clean -g infile.pdf outfile.pdf first-last

Impose a PDF (nup)[編輯 | 編輯原始碼]

PDF Imposition is the process by which multiple input pages are combined into one output page, layed out into a rowsxcolumns grid.

It can be done with pdfjam (notice that wrapper scripts such as pdfnup and pdfbook are deprecated):

$ pdfjam --nup rowsxcolumns input.pdf --outfile output.pdf

or with pdfsak:

$ pdfsak --input-file input.pdf --output output.pdf --nup rows columns

Inspect metadata[編輯 | 編輯原始碼]

With ExifTool:

$ exiftool -All file.pdf

With Poppler:

$ pdfinfo file.pdf

Remove metadata[編輯 | 編輯原始碼]

Using ExifTool[編輯 | 編輯原始碼]

With ExifTool:

$ exiftool -All= -overwrite_original input.pdf
$ mv input.pdf /tmp/temp.pdf
$ qpdf --linearize /tmp/temp.pdf input.pdf

The linearize step is needed to prevent recovery of deleted metadata. See this SuperUser question and the related ExifTool forum thread.

Using pdftk[編輯 | 編輯原始碼]

Many PDFs store document metadata using both an Info dictionary (old school) and an XMP stream (new school). This pdftk command remove the XMP stream from the PDF altogether. It does not remove the Info dictionary.

Note that objects inside the PDF might have their own, separate XMP metadata streams, and that this command does not remove those. It only removes the PDF’s document‐level XMP stream.

$ pdftk input.pdf drop_xmp output output.pdf

Reduce size of a PDF[編輯 | 編輯原始碼]

PDF size can be reduced by setting an appropriate optimization or compression level.

With Ghostscript one of:

$ ps2pdf -dPDFSETTINGS=/screen in.pdf out.pdf

or

$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=out.pdf in.pdf

For different settings see the documentation.

There is also shrinkpdfAUR, a script wrapping gs.

Rasterize a PDF[編輯 | 編輯原始碼]

These commands will convert your PDF into images.

With GraphicsMagick to convert a specific page into an image file:

$ gm convert -density dpi infile.pdf[page] outfile.jpg

With ImageMagick to convert a specific page into an image file:

$ magick convert -density dpi infile.pdf[page] outfile.jpg

With ImageMagick to convert all pages into another PDF file composed by an image file per page:

$ magick convert -density dpi infile.pdf outfile.pdf
警告:This will increase the file size of your PDF substantially. Use it for example if your printer is not able to print your PDF correctly.

With Poppler to convert all pages into one image file per page:

$ pdftoppm -jpeg -r dpi infile.pdf outfileroot

With Poppler to convert a specific page into an image file:

$ pdftoppm -jpeg -r dpi -f page -singlefile infile.pdf outfileroot

Split PDF pages[編輯 | 編輯原始碼]

With mupdf-tools to split every page vertically into two pages:

$ mutool poster -y 2 in.pdf out.pdf

Can be used to undo simple imposition.

Add an image[編輯 | 編輯原始碼]

Adding an image to any location in a PDF can be done

Details on these and other solutions can be found on StackExchange.

Add digital signature to PDF[編輯 | 編輯原始碼]

jsignpdfAUR can digitally sign PDF files with X.509 certificates in GUI and CLI.

Readers such as Okular and MuPDF can sign PDFs with digital signatures. This requires a PFX certificate, which can be created with an OpenSSL command:

$ openssl req -x509 -days 365 -newkey rsa:2048 -keyout cert.pem -out cert.pem
$ openssl pkcs12 -export -in cert.pem -out cert.pfx

MuPDF users can then sign PDFs with the cert.pfx using the graphical interface, or its mutool-sign tool.

Okular users must import cert.pfx into a certificate store such as the one in the default Firefox profile.[7][失效連結 2024-01-13 ⓘ] With Firefox this is done through Settings > Privacy & Security > View Certificates > Your Certificates > Import and selecting cert.pfx. Afterwards Okular will offer this certificate to be used when signing PDFs.

Libreoffice can also sign PDFs.[8]

Removing annotations from a PDF[編輯 | 編輯原始碼]

With pdftk [9]:

$ pdftk in.pdf output - uncompress | sed '/^\/Annots/d' | pdftk - output out.pdf compress

With perl-cam-pdfAUR:

$ rewritepdf.pl -C in.pdf out.pdf

See https://superuser.com/a/1051543 for more information.

Add page numbers[編輯 | 編輯原始碼]

With pdfsak:

$ pdfsak --input-file input.pdf --output output.pdf --text "\large \$page/\$pages" br 0.99 0.99 --latex-engine xelatex --font "Noto Regular"

Add page labels[編輯 | 編輯原始碼]

Page labels are logical page numbers shown in the navigation bar of your PDF reader. They are useful for example if the first pages of the PDF are indices numbered with roman numbers (I, II, etc.), while the page numbered "1" corresponds to a PDF page greater than 1, and you want the page number shown in the navigation bar to corresponds to the page number shown in the physical page.

This should not be confused with adding page numbers into a physical page. See section 12.4.2 of PDF reference to better understand page labels.

  1. Using pagelabels-py, let's say we have a PDF named my_document.pdf, that has 12 pages.
    • Pages 1 to 4 should be labelled Intro I to Intro IV.
    • Pages 5 to 9 should be labelled 2 to 6.
    • Pages 10 to 12 should be labelled Appendix A to Appendix C
    • We can issue the following list of commands:
      $ python3 -m pagelabels --delete "my_document.pdf"
      $ python3 -m pagelabels --startpage 1 --prefix "Intro " --type "roman uppercase" "my_document.pdf"
      $ python3 -m pagelabels --startpage 5 --firstpagenum 2 "my_document.pdf"
      $ python3 -m pagelabels --startpage 10 --prefix "Appendix " --type "letters uppercase" "my_document.pdf" 
    • 注意:pagelabels-py will convert your file to PDF 1.3 specification
  2. Using pdftk, create a metadata.txt file with labels:
    PageLabelBegin
    PageLabelNewIndex: 1
    PageLabelStart: 1
    PageLabelPrefix: Cover
    PageLabelNumStyle: NoNumber
    PageLabelBegin
    PageLabelNewIndex: 2
    PageLabelStart: 1
    PageLabelPrefix: Back Cover
    PageLabelNumStyle: NoNumber
    PageLabelBegin
    PageLabelNewIndex: 3
    PageLabelStart: 1
    PageLabelNumStyle: LowercaseRomanNumerals
    PageLabelBegin
    PageLabelNewIndex: 27
    PageLabelStart: 1
    PageLabelNumStyle: DecimalArabicNumerals 
    • Where:
      PageLabelBegin
      signal a new page label definition will follow
      PageLabelNewIndex
      is the PDF page index from which the numbering style applies, counting from one. The numbering style will continue until the next page label or, if there are no more page labels, until the end of the document.
      PageLabelStart
      is the starting number. For example, if you specify 5 here, the pages will be numbered 5, 6, 7, ...
      PageLabelPrefix
      a text to put before the number in page labels.
      PageLabelNumStyle
      can be DecimalArabicNumerals, UppercaseRomanNumerals, LowercaseRomanNumerals, UppercaseLetters, LowercaseLetters or NoNumber.
    • Then use:
      pdftk book.pdf update_info_utf8 metadata.txt output book-with-metadata.pdf

See this SuperUser question for more details.

Extract bookmarks[編輯 | 編輯原始碼]

With pdftk:

$ pdftk file.pdf dump_data_utf8 | grep '^Bookmark'

With qpdf:

$ qpdf --json --json-key=outlines file.pdf

See https://unix.stackexchange.com/questions/143886/how-to-extract-bookmarks-from-a-pdf-file for more information.

Add bookmarks[編輯 | 編輯原始碼]

With pdftk[編輯 | 編輯原始碼]

Create a text file bookmark_definitions.txt with bookmark definitions in the following format:

BookmarkBegin
BookmarkTitle: Chapter 1
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: Chapter 1.1
BookmarkLevel: 2
BookmarkPageNumber: 2
BookmarkBegin
BookmarkTitle: Chapter 1.2
BookmarkLevel: 2
BookmarkPageNumber: 3
BookmarkBegin
BookmarkTitle: Chapter 1.3
BookmarkLevel: 2
BookmarkPageNumber: 4
BookmarkBegin
BookmarkTitle: Chapter 1.3.1
BookmarkLevel: 3
BookmarkPageNumber: 5
BookmarkBegin
BookmarkTitle: Chapter 2
BookmarkLevel: 1
BookmarkPageNumber: 6

Where

BookmarkBegin
signal a new bookmark definition
BookmarkTitle
the title of the bookmark
BookmarkLevel
the level of the bookmark in the hierarchy
BookmarkPageNumber
the page number the bookmark redirects to

In this example, the above file will create the following bookmark structure:

  • Chapter 1
    • Chapter 1.1
    • Chapter 1.2
    • Chapter 1.3
      • Chapter 1.3.1
  • Chapter 2

Apply the bookmarks with the following command:

$ pdftk input.pdf update_info_utf8 bookmark_definitions.txt output output.pdf

Extract pages contained within a bookmark[編輯 | 編輯原始碼]

To extract the pages contained within a bookmark, you can use pdf_extbook-gitAUR.

With pdf_extbook file you will be prompted on what bookmark whose pages you want to extract and where to save it. To extract all bookmarks of a given hierarchical level:

$ pdf_extbook file -a level output_file_stem

Remove blank pages[編輯 | 編輯原始碼]

One can use the following script to remove blank pages form a PDF file (credit: SuperUser post):

#!/bin/sh

IN="$1"
filename=$(basename "${IN}")
filename="${filename%.*}"
PAGES=$(pdfinfo "$IN" | grep ^Pages: | tr -dc '0-9')

non_blank() {
	for i in $(seq 1 $PAGES); do
		PERCENT=$(gs -o - -dFirstPage=${i} -dLastPage=${i} -sDEVICE=ink_cov "$IN" | grep CMYK | nawk 'BEGIN { sum=0; } {sum += $1 + $2 + $3 + $4;} END { printf "%.5f\n", sum } ')
		if [ $(echo "$PERCENT > 0.001" | bc) -eq 1 ]; then
			echo $i
			#echo $i 1>&2
		fi
		echo -n . 1>&2
	done | tee "$filename.tmp"
	echo 1>&2
}

set +x
pdftk "${IN}" cat $(non_blank) output "${filename}_noblanks.pdf"

Use it like pdf_remove_blank_pages input.pdf.

The script needs pdftk, nawk and ghostscript.

Find fonts used in a PDF[編輯 | 編輯原始碼]

The pdffonts(1) command (from poppler), can be used to find which fonts a PDF uses and if they have been embedded in it or not:

$ pdffonts file.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Times-Roman                          Type 1            Custom           no  no  no       8  0
Times-Italic                         Type 1            Standard         no  no  no       9  0
Times-Bold                           Type 1            Standard         no  no  no       7  0
Helvetica                            Type 1            Standard         no  no  no      34  0
Helvetica-Bold                       Type 1            Standard         no  no  no      35  0

This can be used when having issues displaying properly the text in a PDF, to determine if missing fonts or their metric-compatible equivalent need to be installed.

Repair broken PDF file[編輯 | 編輯原始碼]

With ghostscript:

$ gs -o repaired.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress corrupted.pdf

With poppler:

$ pdftocairo -pdf corrupted.pdf repaired.pdf

With mupdf-tools:

$ mutool clean corrupted.pdf repaired.pdf

Reference: https://superuser.com/q/278562

Convert PDF to PDF/A standard[編輯 | 編輯原始碼]

With ghostscript:

$ gs -dPDFA -dBATCH -dNOPAUSE -sColorConversionStrategy=UseDeviceIndependentColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=2 -sOutputFile=document_pdfa.pdf document.pdf

Reference: https://stackoverflow.com/a/56459053

Validate PDF/A compliance[編輯 | 編輯原始碼]

Using verapdfAUR you can validate the compliance of your PDF to different flavours of the PDF/A standard:

$ verapdf --flavour 1a --format text document.pdf

DjVu tools[編輯 | 編輯原始碼]

  • DjVuLibre provides many command-line tools, like ddjvu(1) for example.
  • img2djvu — Single-pass DjVu encoder based on DjVu Libre and ImageMagick.
https://github.com/ashipunov/img2djvu || img2djvu-gitAUR
  • pdf2djvu — Creates DjVu files from PDF files.
https://jwilk.net/software/pdf2djvu || pdf2djvuAUR

Convert DjVu to images[編輯 | 編輯原始碼]

Break Djvu into separate pages:

$ djvmcvt -i input.djvu /path/to/out/dir output-index.djvu

Convert Djvu pages into images:

$ ddjvu --format=tiff page.djvu page.tiff

Convert Djvu pages into PDF:

$ ddjvu --format=pdf inputfile.djvu ouputfile.pdf

You can also use --page to export specific pages:

$ ddjvu --format=tiff --page=1-10 input.djvu output.tiff

this will convert pages from 1 to 10 into one tiff file.

Processing images[編輯 | 編輯原始碼]

You can use scantailor-advanced to:

  • fix orientation
  • split pages
  • deskew
  • crop
  • adjust margins

Make DjVu from images[編輯 | 編輯原始碼]

There is a useful script img2djvu-gitAUR.

$ img2djvu -c1 -d600 -v1 ./out

it will create 600 DPI out.djvu from all files in ./out directory.

Alternatively, you can try didjvuAUR, which seems to create smaller files especially on images with well defined background.

PostScript tools[編輯 | 編輯原始碼]

  • pstotext — Converts PostScript files to text.
https://www.cs.wisc.edu/~ghost/doc/pstotext.htm || pstotextAUR

ps2pdf[編輯 | 編輯原始碼]

ps2pdf is a wrapper around ghostscript to convert PostScript to PDF:

$ ps2pdf -sPAPERSIZE=a4 -dOptimize=true -dEmbedAllFonts=true YourPSFile.ps

Explanation:

  • with -sPAPERSIZE=something you define the paper size. For valid PAPERSIZE values, see [10][失效連結 2022-09-22 ⓘ].
  • -dOptimize=true lets the created PDF be optimised for loading.
  • -dEmbedAllFonts=true makes the fonts look always nice.
注意:You cannot choose the paper orientation in ps2pdf. If your input PS file is healthy, it already contains the orientation information. If you are trying to use an Encapsulated PS file, you will have problems, if it does not fit in the -sPAPERSIZE you specified, because EPS files usually do not contain paper orientation information. A workaround is creating a new paper in ghostscript settings (call it e.g. "slide") and use it as -sPAPERSIZE=slide.

Libraries[編輯 | 編輯原始碼]

C/C++[編輯 | 編輯原始碼]

  • libharu — C library for generating PDF documents.
https://github.com/libharu/libharu || libharu, Lua binding: lua-hpdfAUR
  • PoDoFo — A C++ library to work with the PDF file format.
https://podofo.sourceforge.net || podofo

Python[編輯 | 編輯原始碼]

  • borb — borb is a library for reading, creating and manipulating PDF files in python.
https://borbpdf.com/, https://github.com/jorisschellekens/borb || 未被打包?在 AUR 裡搜索
  • pdfrw — A pure Python library that reads and writes PDFs.
https://github.com/pmaupin/pdfrw || python-pdfrw
  • PyPDF — A pure-Python library built as a PDF toolkit.
https://github.com/py-pdf/pypdf || python-pypdf
  • PyX — Python library for the creation of PostScript and PDF files.
https://pyx.sourceforge.net || python-pyx
  • ReportLab — A proven industry-strength PDF generating solution
https://www.reportlab.com/ || python-reportlab

Java[編輯 | 編輯原始碼]

  • iText Core — iText is a more versatile, programmable and enterprise-grade PDF solution that allows you to embed its functionalities within your own software for digital transformation.
https://itextpdf.com/products/itext-core || itext-rups-binAUR
  • OpenPDF — OpenPDF is a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText.
https://github.com/LibrePDF/OpenPDF || 未被打包?在 AUR 裡搜索

See also[編輯 | 編輯原始碼]