A personal article resolver in Python

A while back, I wrote a little post on my fondness for the Chemistry Reference Resolver. It is a wonderful tool for quickly accessing journal articles in a not-so-hyperlinked world, especially when used via an app launcher like Alfred. However, I would occasionally run into a journal that it didn’t connect to. I’m also often looking for little scripting projects, so I decided to put together a Python program that would replicate (some of) its behavior and could be run locally.

The result of this is a little script called paper_finder, hosted on my GitHub account. It is designed as a command line tool that can be invoked as follows:

paper_finder.py abb v123 12345

Doing so will locate the article in journal “abb” (where “abb” is a customizable abbreviation), volume 123, and page 12345, and open the publisher’s page in the default web browser. The “v” is assumed if omitted. The script also accepts years in place of volumes (“y1224”) when it makes sense in the context of the journal (i.e., this obviously won’t work for journals with multiple volumes per year).

The information for the journals themselves is kept in a separate yaml database that is easily customized. Simple journals are stored like this:

- fullname: "Journal of the American Chemical Society"
  names:
    - jacs
    - jamchemsoc
    - journaloftheamericanchemicalsociety
  start_year: 1879
  start_vol: 1
  url: "http://pubs.acs.org/action/quickLink?quickLinkJournal=jacsat&quickLink=true&quickLinkVolume={vol}&quickLinkPage={page}"

The “names” field provides abbreviations, with no spaces, that the script will recognize (it automatically removes spaces when parsing abbreviations). I usually list a few that seem intuitive so that I’ll have a chance at guessing them if I’ve forgotten which one to use. The “start_year” and “start_vol” fields are fairly self-explanatory. They do not both need to be specified, but the script won’t be able to convert from year to volume and vice versa if they aren’t both present.

The “url” field is the tricky bit, as it needs to provide a url that can be used to resolve the year/volume (“{year}” or “{vol}”) and page number (“{page}”) on the publisher’s website. Some are kind enough to document their interface; most are not (or at least I couldn’t find anything). Fortunately, it’s usually possible to figure out something that works by playing around with the search or citation lookup functions for various sites, and it’s generally publisher-dependent, not journal-dependent.

For journals with more complex histories, the script supports different iterations:

- fullname: "Chemical Communications"
  names:
    - cc
    - chemcomm
    - chemcommun
    - chemicalcommunications
  iterations:
    - fullname: "chemcommun"
      start_year: 1996
      start_vol: 32
      url: "http://pubs.rsc.org/en/results?artrefjournalname=chem%20commun&artrefstartpage={page}&artrefvolumeyear={year}&fcategory=journal"
    - fullname: "jcschemcommun"
      start_year: 1972
      end_year: 1995
      url: "http://pubs.rsc.org/en/results?artrefjournalname=j.%20chem.%20soc.%2c%20chem.%20commun.&artrefstartpage={page}&artrefvolumeyear={year}&fcategory=journal"

Thus, the same abbreviations can access separate journal landing pages dependent on the year/volume of publication. The “end_year” or “end_vol” fields must be set for retired versions of the journals.

Most people don’t live on the command line, but this tool is customizable, and very handy when coupled to another system like the aforementioned Alfred. Invoke the launcher, type the keyword (e.g., “art”) and citation info, and voila!

Leave a Reply

Your email address will not be published. Required fields are marked *