An improved filter for handling schemes, charts, and graphs in pandoc

About a year and a half ago, I wrote a series of pandoc filters designed to facilitate writing chemistry in pandoc. Specifically, I was looking to solve a few problems:

  • Not surprisingly, there is no native support for alternate figure types (schemes, charts, etc.) in pandoc.
  • Pandoc takes advantage of LaTeX’s native captioning. This has the side effect of adding labels (“Figure X”) to captions in LaTeX/pdf output, but not in other formats (e.g., html, Word). I always found this mildly irritating since the same markdown is rendered differently in different formats.
  • Pandoc doesn’t have native cross-referencing support (although I suspect that will change down the road).
  • Pandoc doesn’t natively support the “wordwrap” LaTeX package, so text does not flow around figures in LaTeX/pdf output. In principle, I don’t have a problem with this, but if you’re writing with a hard page limit it’s a huge problem.

The cross-referencing issue is a big one, and not surprisingly there are excellent filters already available that provide this functionality (e.g., pandoc-fignos, pandoc-numbering). However, for the specific combination of problems faced by a chemist trying to write reasonable looking chemistry, I didn’t find solutions that exactly met my needs.

The solutions that I came up with worked reasonably well, but honestly were never all that satisfying. To solve the first two issues, I used a LaTeX template that suppressed the native figure labels so that I could always explicitly state them in the caption. Cross-referencing was solved with a very rudimentary filter (pandoc-figref) that has only a fraction of the functionality of other solutions but is very flexible (it just sets up separate lists of sequential numbers that can be referenced). Word wrapping was similarly handled by a rudimentary filter (pandoc-wrapfig) that made use of a flag at the end of a caption to set the figure width. All figures were always flush with the right margin.

All of this worked reasonably well for a workflow that depended only on html, pdf, and Word output (html for previewing with Marked, pdf/Word for the final product). However, even beyond the obvious limitations of these filters, there were some problems. In particular, because LaTeX output relied on various hacks in the template, I got passable pdfs but ugly LaTeX code.

There is one scenario where being able to produce decent LaTeX is very useful: submissions of manuscripts. Most journals allow either .doc(x) or LaTeX submissions. Up until quite recently, I had always just used pandoc to dump my writing into Word and then pasted into the journal’s template. This is fine, but leads to (1) having two places to make changes if any final corrections are noticed before submission, and (2) having to deal with Word’s truly awful handling of figures.

So, I recently worked out an improved filter, pandoc-chemfig, that takes care of most of these problems, and produces cleaner pandoc markdown to boot. I’ll forgo posting the code here: if you’d like to give it a try, best bet is to grab it straight from Github.

The script takes advantage of pandoc’s support for image attributes, which, if I remember correctly, was added after I wrote my original filters. For example, a figure is declared as follows:

![Image caption](path/to/image_file.pdf){#img:id .class wwidth=0in wpos=l}

The id (“img:id”) is used as the handle for cross-referencing, the class (“.class”) specifies the type of figure (“.figure”, “.scheme”, etc.), and the variables “wwidth” and “wpos” are passed to the LaTeX wrapfig package, if they’re included (a width of “0in” has the box fit the size of the figure).

The figure number can be referenced throughout the document easily:

Please see Figure @img:id for more information.

The syntax is taken from the excellent pandoc-fignos, pandoc-eqnos, and pandoc-tablenos filters so that it will blend well if these packages are also used. Note that these offer many more features (e.g., automatic labels, ranges, etc.), but not, as far as I know, the alternate figure types.

The filter enables the scheme, chart, and graph environments in LaTeX output (assuming the required packages have been loaded) and provides rudimentary support for figures with wrapped text. It also takes care of the captioning problem: In LaTeX the built-in caption labels are used, and in other formats it automatically adds labels. These are formatted according to the ACS style by default (“Figure 1. The caption.”), but this can be modified using the “fig-abbr” variable passed as metadata. For example, the following would format the captions as in “FIG 1 | The caption.” The backslashes are just there to escape the spaces.

fig-abbr:
    figure: "**FIG\\ **"
    scheme: "**SCH\\ **"
    suffix: "\\ |\\ "

The nice thing about this is that it gets really close to the ideal of being able to submit manuscripts directly from the pandoc source. My basic workflow right now is this: set up a LaTeX template based on the required journal format; write in markdown; compile document to LaTeX with pandoc and then to pdf with latexmk. The latter part is easily scripted to keep everything tidy and provides all the files that are needed for submission. And, of course, it remains straightforward to convert to Word as a fallback plan.

Leave a Reply

Your email address will not be published.