\(\LaTeX\) #
Creating PDF/A Documents #
This is a condensation of the information provided in the pdfx
package documentation
and by Peter Selinger:
A PDF/A document is a special kind of PDF document that has been optimized for long-term archiving. […] Some of the main features of PDF/A documents are:
- Self-containedness: all resources that are required to reproduce the document’s visual appearance, such as fonts, color spaces, etc., are embedded within the document itself. […]
- Unicode mapping: all of the document’s content has been mapped to machine readable Unicode text. Such a mapping makes the document searchable, allows text to be copied and pasted, and allows text to be displayed in other forms (such as via a screen reader for the blind).
- Metadata: PDF/A specifies a standardized format for including metadata, […] which helps to ensure that the document can be found and correctly indexed by search engines, libraries, etc.
It is best to produce the correct format from LaTeX sources directly and not to convert an existing document with third-party tools. Otherwise information is lost in the process.
You’ll need at least pdfTeX
version 1.40.15. At the time of this writing my version was 1.40.21. Check the
version with pdflatex --version
. And make sure you have at least version 1.5.8 of the pdfx
package; it
is probably bundled already if you have a sufficiently recent distribution, however.
Add the necessary pdfx
and hyperref
packages to the document’s preamble. It is best to place them
high up in the order – if possible directly below the \documentclass
.
The pdfx
package must be included first because it patches a few elements of the hyperref
package for compliance.
In case you want to specify any options to the hyperref
package use \hypersetup
:
\documentclass[a4paper, 11pt, openright, twoside, ngerman]{report}
\usepackage[a-1b]{pdfx}
\usepackage{hyperref}
\hypersetup{hidelinks}
...
The document metadata is included from an *.xmpdata
file. It must have the same basename as your main LaTeX
file. For example a report.tex
needs a report.xmpdata
. The format of this file and a list of possible
options is described in Section 2.2 of the pdfx
documentation. Peter Sellinger also provides a
sample file to get going quickly.
A particularly useful option to provide this data file is by using a {filecontents*}
environment before the
\documentclass
at the very top of the main source file:
\begin{filecontents*}{\jobname.xmpdata}
\Title{My Report}
\Author{Anton Semjonov}
\Language{de-DE}
\Keywords{report\sep university}
\Subject{A short description.}
\end{filecontents*}
\documentclass[...
If your section titles contain formulas you may need to fix the PDF outline links by providing an alternative
string with \texorpdfstring{$<formula>$}{<string>}
. You can also use UTF-8 strings for that if your document uses
the appropriate input encoding:
\usepackage{inputenc}
\hypersetup{pdfencoding=unicode}
\inputencoding{utf8}
\makeatother
To test your document you can use pdfinfo report.pdf
. It should output PDF subtype: PDF/A-1b:2005
among
with your configured metadata.