About a year and a half ago, I wrote a series of pandoc filters designed to facilitate writing chemistry in pandoc. Specifically, I was looking to solve a few problems:
- Not surprisingly, there is no native support for alternate figure types (schemes, charts, etc.) in pandoc.
- Pandoc takes advantage of LaTeX’s native captioning. This has the side effect of adding labels (“Figure X”) to captions in LaTeX/pdf output, but not in other formats (e.g., html, Word). I always found this mildly irritating since the same markdown is rendered differently in different formats.
- Pandoc doesn’t have native cross-referencing support (although I suspect that will change down the road).
- Pandoc doesn’t natively support the “wordwrap” LaTeX package, so text does not flow around figures in LaTeX/pdf output. In principle, I don’t have a problem with this, but if you’re writing with a hard page limit it’s a huge problem.
The cross-referencing issue is a big one, and not surprisingly there are excellent filters already available that provide this functionality (e.g., pandoc-fignos, pandoc-numbering). However, for the specific combination of problems faced by a chemist trying to write reasonable looking chemistry, I didn’t find solutions that exactly met my needs.
This post is a bit specific, but hopefully useful for anyone who uses pandoc and happens to use Sublime Text as their text editor of choice.
In general, I like to write my pandoc documents as close to the canonical format as possible and let LaTeX deal with positioning figures. It works pretty well, and it’s infinitely more straightforward than doing it in Word. However, LaTeX doesn’t natively wrap text around figures, and sometimes you really need to maximize the use of space. In pure LaTeX, this can be done with the wrapfig package. In pandoc, this package can be used through a template, but it’s a little tricky if you don’t want every figure treated the same way.
Chemists may be unique in the number of different types of floats we use when writing. At a minimum Figures, Schemes, and Tables, and often Charts. Keeping track of all of these when writing can be a nuisance. I tend to write a lot and then revise heavily, especially when working on proposals. Inevitably the numbering gets screwed up after deleting or moving things around.
One of the great things about pandoc is that it is very extensible through the use of filters. The best example of this is pandoc-citeproc, which is how references are processed in the native pandoc syntax. However, there are many other filters available, and they are fairly easy to write if you’re passingly familiar with any one of a number of different programming languages (although Haskell—pandoc’s native language—and Python appear to be most common).
As a chemist, this sort of extensibility is both tremendously useful and sometimes very necessary. There are, I’ve realized, some real idiosyncrasies to our writing (for example, our insistence on having at least two and sometimes three different categories of figures that are numbered separately). In LaTeX, these are taken care of with different packages, like mhchem. I like to use LaTeX for Supporting Info files and these sorts of packages are very useful. In pandoc, a lot of similar functionality can be added through short filters that are applied when the files are processed.
Yikes, been longer than I intended between posts. Alas for the uneven free time of an academic. In my defense, I’m trying to start up a new project in my group and I have a very energetic dog.
In the last post, I discussed my use of pandoc as a tool for writing in plain text and outputting to a variety of different formats. I use it as much as I possibly can because I just prefer writing in the simplest format possible and then “compiling” my final documents for distribution. Here, I’d like to share a simple template that I use to produce letters on Miami’s letterhead. I use this primarily for letters of recommendation.
Back in the dark ages, when working on the first (now defunct) html version of the group website, I started to wonder a bit about making more use of plain text in my everyday workflows. As much as html isn’t exactly a convenient format for writing, I was tired of trying to herd figures into the right places in Word.