A while back, I wrote a little post on my fondness for the Chemistry Reference Resolver. It is a wonderful tool for quickly accessing journal articles in a not-so-hyperlinked world, especially when used via an app launcher like Alfred. However, I would occasionally run into a journal that it didn’t connect to. I’m also often looking for little scripting projects, so I decided to put together a Python program that would replicate (some of) its behavior and could be run locally.
About a year and a half ago, I wrote a series of pandoc filters designed to facilitate writing chemistry in pandoc. Specifically, I was looking to solve a few problems:
- Not surprisingly, there is no native support for alternate figure types (schemes, charts, etc.) in pandoc.
- Pandoc takes advantage of LaTeX’s native captioning. This has the side effect of adding labels (“Figure X”) to captions in LaTeX/pdf output, but not in other formats (e.g., html, Word). I always found this mildly irritating since the same markdown is rendered differently in different formats.
- Pandoc doesn’t have native cross-referencing support (although I suspect that will change down the road).
- Pandoc doesn’t natively support the “wordwrap” LaTeX package, so text does not flow around figures in LaTeX/pdf output. In principle, I don’t have a problem with this, but if you’re writing with a hard page limit it’s a huge problem.
The cross-referencing issue is a big one, and not surprisingly there are excellent filters already available that provide this functionality (e.g., pandoc-fignos, pandoc-numbering). However, for the specific combination of problems faced by a chemist trying to write reasonable looking chemistry, I didn’t find solutions that exactly met my needs.
In general, I like to write my pandoc documents as close to the canonical format as possible and let LaTeX deal with positioning figures. It works pretty well, and it’s infinitely more straightforward than doing it in Word. However, LaTeX doesn’t natively wrap text around figures, and sometimes you really need to maximize the use of space. In pure LaTeX, this can be done with the wrapfig package. In pandoc, this package can be used through a template, but it’s a little tricky if you don’t want every figure treated the same way.
Chemists may be unique in the number of different types of floats we use when writing. At a minimum Figures, Schemes, and Tables, and often Charts. Keeping track of all of these when writing can be a nuisance. I tend to write a lot and then revise heavily, especially when working on proposals. Inevitably the numbering gets screwed up after deleting or moving things around.
One of the great things about pandoc is that it is very extensible through the use of filters. The best example of this is pandoc-citeproc, which is how references are processed in the native pandoc syntax. However, there are many other filters available, and they are fairly easy to write if you’re passingly familiar with any one of a number of different programming languages (although Haskell—pandoc’s native language—and Python appear to be most common).
As a chemist, this sort of extensibility is both tremendously useful and sometimes very necessary. There are, I’ve realized, some real idiosyncrasies to our writing (for example, our insistence on having at least two and sometimes three different categories of figures that are numbered separately). In LaTeX, these are taken care of with different packages, like mhchem. I like to use LaTeX for Supporting Info files and these sorts of packages are very useful. In pandoc, a lot of similar functionality can be added through short filters that are applied when the files are processed.
In the last post, I talked about my experiences using Blender to visualize different aspects of my group’s research. In this post, I’ll give a quick introduction to the script I use to import geometries into Blender along with the template I import them into. The script brings in structures in PDB format files and can generate bond-line or space-filling models.
Here I thought I’d go through the script and explain its logic. The idea is that it could be modified to fit whatever format you get your MC output in or to change the information that it returns, hopefully even without a lot of programming experience.
Let me say at the outset that this was one of my first little Python projects. I make no claims that it’s particularly elegant or pythonic; in fact, I have plans for a newer version that will be a bit better organized. Criticisms always appreciated. That said, it does get the job done.
The script is written in Python. I decided to pick it up a few years ago as a way to get back into simple computer programming and haven’t looked back. I use it a lot for little projects like this one and for some simple applications in my group’s publications (like nonlinear curve fitting). Unfortunately, Python’s not a compiled language so you’ll have to have it installed in order to use the script, which is written in Python 3, not Python 2.[1. Right now a slow transition is taking place from Python 2 to 3. There’s still a lot done in 2, but I chose to learn 3 because my need for external packages is limited. Also, Python 3 is the scripting language for Blender, which will be the subject of many future posts.]
In an ideal world, I think we’d all prefer faculty-graded, free-response style exams. Unfortunately, that’s just not really realistic for a chemistry professor teaching large service classes.[1. The largest classes I’ve taught have up to about 200 students.] While we do use online homework for the classes I teach, we’re not really set up to use these sorts of systems as a primary assessment tool. That leaves, of course, machine-graded, multiple choice (MC) as the go-to format for most (but not all) of my exam questions.