PDF Table of Contents Generator (and Projects)

Published: 2025/11/26

Last Updated: 2025/11/26

For the past 10 months, I’ve been engrossed in learning Basic/Expert D&D and Advanced D&D First Edition in order to referee some online games, and part of that has meant reading every bit of relevant material (including retroclones, predecessor systems, and derivatives) that I can get my hands on, most of which come as PDFs. However, not all of these PDFs are as tidy as I’d like, with many either omitting a table of contents or having faulty or otherwise sub-optimal examples. This bothered me enough to actually do something about it, and I was very pleased to run into a magnificent project that makes it either dead simple or far more feasible, depending.

pdf.tocgen

The simply-named pdf.tocgen is a python project that ultimately consists of three single-purpose utilities: pdfxmeta (programmable search tool to pull out headers into a recipe file), pdftocgen (uses a recipe file to generate a table of contents from a pre-OCR’d PDF), and pdftocio (copies or imports an existing table of contents into a copy of the PDF).

The basic workflow depends on how the target PDF is structured. If it just needs a bit of alteration, you use pdftocio to copy out the table of contents into a simple text file, edit it, and then use the same to generate a fresh copy of the PDF. If there is no pre-existing table of contents, or a fresh start is needed, pdfxmeta can be used to create a file that describes what you count as a “header”, which can be fed to pdftocgen to generate the aforementioned text file. The project’s site has excellent examples as to how all of this works.

Perhaps most importantly, the structure of the text file is wonderfully simple: headings are quoted strings, with each level of indentation serving to nest it one layer lower, and the PDF page number is given immediately after. As a simple example, here’s an excerpt from the AD&D Players Handbook file:

"Players Handbook" 1
[...]
    "CHARACTER CLASSES" 19
        "Cleric" 21
            "Druid" 21
        "Fighter" 23
            "Paladin" 23
            "Ranger" 25
        "Magic-User" 26
            "Illusionist" 27
        "Thief" 27
            "Assassin" 29
        "Monk" 31
        "Multi-Classed Character" 33
        "The Character with Two Classes" 34
[...]

Couldn’t be simpler.

My Output

I’ve used this to create, edit, and/or expand the table of contents in a number of works now, and for lack of anywhere else to put them, will dump them here. These are of course useless without the corresponding PDF they were designed for, other than as reference material. I may add to this pile in the future if and when I do more of these.

As a reminder or quickstart, if you didn’t read their documentation, these can be used as follows, which will automatically create a new, renamed copy of the PDF:

$ pdftocio file.pdf < toc.txt

I came here to awoo at you

Back to main page