Add Bookmarks for PDF File
7 Oct 2019Bookmarks in PDF files provide a convenient way to jump between pages. But the bookmarks are not always included in a PDF file. Especially on smartphone or e-reader, it could be a disaster without bookmarks.
Then I looked for methods of adding bookmarks for PDF.
There are many tools and programming libraries which could do this job,
since PDF is an
open format.
But I could not find a suitable one.
What I found are either too complicated or insufficient for my
requirements.
For example, ghostscript
could add bookmarks through pdfmark
,
but it needs a complicated pdfmark
file, and title in Chinese
must be converted to UTF-16BE
format beforehand.
Therefore, I wrote the
pdf-bookmark
command to implement this feature.
Meanwhile, the bmk
file format is also defined for simplifying
the description of bookmarks.
Installation of pdf-bookmark
pdf-bookmark
must run under Python3.
Install pdf-bookmark
with pip
:
$ pip install pdf-bookmark
It is a good practice to run the above command under venv in order to avoid the conflicts with system Python packages.
Installation of Ghostscript
pdf-bookmark
utilizes Ghostscript
to generate bookmarks for PDF files, which must be installed before
running pdf-bookmark
.
Ghostscript
is included in many Linux distributions. Take Arch Linux
as an example:
$ sudo pacman -S ghostscript
$ gs --version
Installation of PDFtk
PDFtk is used for
exporting bookmarks from PDF file. If you need this feature, PDFtk
must be installed.
pdftk-java is a java port of PDFtk. In principle it could replace the original PDFtk. But the current version 3.0.8 has a bug, which will lead to wrong page number. Thus the original pdftk is still needed before that bug is fixed.
bmk file format
The bmk
file is used to describe bookmark of a PDF file,
including title, page number, level, etc.
序................1
Chapter 1................4
Chapter 2................5
2.1 Section 1................6
2.1.1 SubSection 1................6
2.1.2 SubSection 2................8
2.2 Section 2................12
Chapter 3................20
Appendix................36
This is the a simple bmk
file, which looks quite like the
content of a book, hence you can copy the content and modify from it.
Each line represents a bookmark item. The title and the page number are
separated by at least 4 dots ”.
“.
The level of a bookmark is specified by the indentation of spaces. The default indentation is 2 spaces, and the number of spaces could be configured with inline command.
Import the bookmark and create a new pdf file:
$ pdf-bookmark -p input.pdf -b bookmark.bmk -o new-with-bookmark.pdf
Inline command
bmk
format also defines some inline commands to do more controls on the
bookmarks. These commands start with !!!
and modify some properties of
bookmark. The new property will affect bookmarks after the line until it
is changed again.
The page number in PDF file does not always start from the first page. And the PDF file could be divided into several pieces, like preface, content, main body, appendix, etc. Page numbers are not always Arabic. Roman and letters could also be used. Some level of bookmarks could also be collapsed and hidden, which could be expanded with mouse click. These are what inline commands could do.
!!! collapse_level = 2
!!! num_style = Roman
Preface................I
Content................IV
!!! new_index = 12
!!! num_style = Arabic
Introduction................1
Chapter 1................4
Chapter 2................5
2.1 Section 1................6
2.2 Section 2................7
Chapter 3................10
Appendix................11
With these inline commands, you do not need to recalculate the index number for each page.
Here are all supported inline commands:
new_index
. Default:1
. The following bookmark index will be recalculated from the new index number (new_index + page - 1
).num_start
. Default:1
. Specify the number of first page if it does not start from 1 (new_index + page - num_start
).num_style
. Default:Arabic
. The page number style. Could beArabic
,Roman
andLetters
.collapse_level
. Default:0
. On which level the bookmarks are collapsed. 0 means expanding all.level_indent
. Default:2
. Number of indentation spaces for a new level.
Export bookmarks from PDF file
PDFtk
must be installed for exporting bookmarks.
Export bookmarks:
$ pdf-bookmark -p input.pdf
The default format if bmk
, which could be changed by -f
arguments.