A while back, I wrote a little post on my fondness for the Chemistry Reference Resolver. It is a wonderful tool for quickly accessing journal articles in a not-so-hyperlinked world, especially when used via an app launcher like Alfred. However, I would occasionally run into a journal that it didn’t connect to. I’m also often looking for little scripting projects, so I decided to put together a Python program that would replicate (some of) its behavior and could be run locally.
About a year and a half ago, I wrote a series of pandoc filters designed to facilitate writing chemistry in pandoc. Specifically, I was looking to solve a few problems:
- Not surprisingly, there is no native support for alternate figure types (schemes, charts, etc.) in pandoc.
- Pandoc takes advantage of LaTeX’s native captioning. This has the side effect of adding labels (“Figure X”) to captions in LaTeX/pdf output, but not in other formats (e.g., html, Word). I always found this mildly irritating since the same markdown is rendered differently in different formats.
- Pandoc doesn’t have native cross-referencing support (although I suspect that will change down the road).
- Pandoc doesn’t natively support the “wordwrap” LaTeX package, so text does not flow around figures in LaTeX/pdf output. In principle, I don’t have a problem with this, but if you’re writing with a hard page limit it’s a huge problem.
The cross-referencing issue is a big one, and not surprisingly there are excellent filters already available that provide this functionality (e.g., pandoc-fignos, pandoc-numbering). However, for the specific combination of problems faced by a chemist trying to write reasonable looking chemistry, I didn’t find solutions that exactly met my needs.
Recently, I was working on an exam for a lab course, and wanted to ask a few questions about basic reaction setups. I had a harder time putting together simple figures than I would have expected. ChemDraw has some half-decent options but they’re useless if you need something they don’t already have and, let’s face it, those 3D flasks with the little logos are just trying to hard. After a few frustrating hours, I gave up, and decided that it wouldn’t be so hard to throw together the basics on my own.
We live in a strange time. Working at a small/medium-sized university, I can access almost any scientific article published in the 20th century without leaving my office chair. In principle, each of these articles is interconnected to the rest of the literature through unique citations, and yet the state of linking between these many documents is terrible. Even the most recently published articles tend to be shamefully isolated.
For all that I like using Blender as a tool to visualize molecular structures, there are obvious tradeoffs when using general-purpose software instead of something customized for chemistry. One of those tradeoffs is that there’s no obvious way to import various kinds of computational chemistry data.
For me, this is most often molecular orbitals, which we occasionally want to show in manuscripts. So, I started to think a bit about how to render isosurfaces in combination with imported geometries. I briefly considered trying to write something that processes cube files (from Gaussian), but quickly gave up. Instead, I’ve found that Jmol does the job wonderfully, and with minimal fuss.
In general, I like to write my pandoc documents as close to the canonical format as possible and let LaTeX deal with positioning figures. It works pretty well, and it’s infinitely more straightforward than doing it in Word. However, LaTeX doesn’t natively wrap text around figures, and sometimes you really need to maximize the use of space. In pure LaTeX, this can be done with the wrapfig package. In pandoc, this package can be used through a template, but it’s a little tricky if you don’t want every figure treated the same way.
Chemists may be unique in the number of different types of floats we use when writing. At a minimum Figures, Schemes, and Tables, and often Charts. Keeping track of all of these when writing can be a nuisance. I tend to write a lot and then revise heavily, especially when working on proposals. Inevitably the numbering gets screwed up after deleting or moving things around.
One of the great things about pandoc is that it is very extensible through the use of filters. The best example of this is pandoc-citeproc, which is how references are processed in the native pandoc syntax. However, there are many other filters available, and they are fairly easy to write if you’re passingly familiar with any one of a number of different programming languages (although Haskell—pandoc’s native language—and Python appear to be most common).
As a chemist, this sort of extensibility is both tremendously useful and sometimes very necessary. There are, I’ve realized, some real idiosyncrasies to our writing (for example, our insistence on having at least two and sometimes three different categories of figures that are numbered separately). In LaTeX, these are taken care of with different packages, like mhchem. I like to use LaTeX for Supporting Info files and these sorts of packages are very useful. In pandoc, a lot of similar functionality can be added through short filters that are applied when the files are processed.
Yikes, been longer than I intended between posts. Alas for the uneven free time of an academic. In my defense, I’m trying to start up a new project in my group and I have a very energetic dog.
In the last post, I discussed my use of pandoc as a tool for writing in plain text and outputting to a variety of different formats. I use it as much as I possibly can because I just prefer writing in the simplest format possible and then “compiling” my final documents for distribution. Here, I’d like to share a simple template that I use to produce letters on Miami’s letterhead. I use this primarily for letters of recommendation.