Automatically numbering figures, schemes, and charts in pandoc

Chemists may be unique in the number of different types of floats we use when writing. At a minimum Figures, Schemes, and Tables, and often Charts. Keeping track of all of these when writing can be a nuisance. I tend to write a lot and then revise heavily, especially when working on proposals. Inevitably the numbering gets screwed up after deleting or moving things around.

In 2015, we shouldn’t be numbering these objects manually. It should be possible to cross-reference figure numbers easily. In LaTeX, this is straightforward using the default Figure numbering and the chemstyle package. But this is awkward to do in pandoc (or Word, for that matter).

To help solve this problem, I recently wrote a little pandoc filter called pandoc-figref (hosted at this github repo). There’s not a whole lot to it:

#! /usr/bin/env python3
"""Pandoc filter that replaces labels of format {#?:???}, where ? is a
single lower case character defining the type and ??? is an alphanumeric
label, with numbers. Different types are counted separately.

"""

from pandocfilters import toJSONFilter, Str
import re

REF_PAT = re.compile('(.*)\{#([a-z]):(\w*)\}(.*)')

known_labels = {}

def figref(key, val, fmt, meta):
    if key == 'Str' and REF_PAT.match(val):
        start, kind, label, end = REF_PAT.match(val).groups()
        if kind in known_labels:
            if label not in known_labels[kind]:
                known_labels[kind][label] = str(len(known_labels[kind])\
                                                + 1)
        else:
            known_labels[kind] = {}
            known_labels[kind][label] = "1"
        return [Str(start)] + [Str(known_labels[kind][label])] + \
               [Str(end)]

if __name__ == '__main__':
    toJSONFilter(figref)

This simple little filter is very handy. Basically, it lets one define labels of the form {#?:label}, where “?” is a letter that defines the type (e.g., “f” for Figure, “s” for Scheme, etc.) and “label” is some unique identifier without spaces. So, for example, the caption for a scheme could be written in pandoc as “Scheme {#s:synth1}. Synthesis of some random compound.” The filter replaces all such instances of the label with the appropriate number, tallying each class of label in the order in which they appear. References to the scheme just use the same label.

Here’s a quick example:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc dictum non leo eget eleifend. Donec et tempor sem. Quisque id faucibus tellus. Nunc venenatis urna vel purus consequat tristique.

![**Figure {#f:helix}.** A picture of a helix](Figures/helix.png)

As shown in Figure {#f:helix}, vestibulum sapien quam, gravida sit amet eleifend quis, bibendum vitae augue. Quisque at dolor et tortor consequat consectetur. Nullam non congue arcu.

When the filter is used to compile with pandoc, the following will be generated:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc dictum non leo eget eleifend. Donec et tempor sem. Quisque id faucibus tellus. Nunc venenatis urna vel purus consequat tristique.

![**Figure 1.** A picture of a helix](Figures/helix.png)

As shown in Figure 1, vestibulum sapien quam, gravida sit amet eleifend quis, bibendum vitae augue. Quisque at dolor et tortor consequat consectetur. Nullam non congue arcu.

4 thoughts on “Automatically numbering figures, schemes, and charts in pandoc

  1. Tom Duck

    For a similar approach, you might be interested to see the pandoc-fignos filter:

    https://github.com/tomduck/pandoc-fignos

    There are a few features that you may find useful:

    1) It uses latex \label and \ref macros, and hard-coded numbers are used for other formats (as you have done);

    2) It works with the new figure attributes introduced with pandoc 1.16; and

    3) The syntax is that worked out in pandoc Issue #813. It seems to me that there is a fair chance this syntax is what pandoc will ultimately end up using.

    There are also pandoc-eqnos and pandoc-tablenos filters.

  2. Scott Post author

    Indeed—I believe I checked out your pandoc-fignos back when I was first thinking about this. It’s much more sophisticated than what I’ve got here. The issue is that I need to be able to number different kinds of figures separately: “Figures”, “Schemes”, and “Charts”. They’re all just images with captions, but they have different meanings in chemistry writing. I didn’t think this was possible with pandoc-fignos?

  3. Scott Post author

    Very cool—thank you for bringing your project to my attention. Much more sophisticated than my little filter here!

Leave a Reply

Your email address will not be published. Required fields are marked *