Waiting for Vacuum

The PDF Problem

I read PDFs for a living, sort of. Literature reviews are essential for doing research and most of the literature available online are in the form of PDFs. Some journals let you read articles as HTML, but these are rare and I have yet to come across a journal that format articles better as HTML than PDF.

Searching

There are numerous databases for scientific articles. In fact it is kind of bizzare how many there are. However, in my experience most researchers I know start their literature review with a Google Scholar search (or even just a regular Google search). Whatever other search engines might have going for them in features, they lack in convinience. Many of the databases are still commonly used and the Google results are typically just a layer on top of these.

PDF libraries

For years I have kept all the PDFs i find when doing literature reviews in an app called Papers. There are numerous alternatives (e.g. Mendeley, Sente, Zotero, JabRef and Bibdesk) and the only reason I have stuck with Papers is that it has felt like the most native on the Mac.

The jobs-to-be-done of these apps is to keep track of your PDFs and make them easy to cite when writing. In my case, I always export the references I need to BibTeX as my writing is mainly done in LaTeX. Therefore, I view my citation manager mainly as a PDF library and I have never really appreciated when these apps attempt integration with writing tools such as Microsoft Word or Pages. I would rather see these apps make it easier to organize and find PDFs. In fact, I think many of the citation managers have been doing themselves a disfavour by adding a bunch of features without improving on their original jobs-to-be-done.

Annotating PDFs

One of the many features citation managers try to include is ways for annotating PDFs. Some even have companion apps on iOS that can sync PDFs back and forth. However, none of them are both great for annotating PDFs and organizing PDF libraries.

I skim most of my PDFs on my Mac. Usually in Preview, simply because it is fast and I like the integration with the Pop Clip PDF highlighting extension. If for some reason Preview does not do the trick, I open the files in Skim. For those who do not know Skim, it is basically to PDFs, what VLC is to video files.

However, my preferred way of reading PDFs, is using the iPad. So when I know I want to sit down and read an article, I take out my iPad and open Remarks.1 There are three reasons why I like this app:

  • It is a single click to go into highlighting mode and I can stay in highlighting mode while scrolling the PDF (I do not accidentally highlight when I am moving down the page)
  • It does not save the annotations to some properitary extended attribute (so I do not need to export to open it in Preview on my Mac)
  • It syncs via Dropbox (I keep my Papers library in Dropbox)

There are apps that have more features and more elegant designs than Remarks, but I have yet to come across one that does these three things better.

Highlighting is one of those activities where a finger is not ideal. I prefer to have the hight of the text smaller than my finger tip making it difficult to select the right line of text using my finger. I therefore prefer to use a stylus for highlighting. I have a Just Mobile AluPen for this sole purpose and it does the job pretty well.

The Problem

When I am done reading on my iPad I am left with a carefully annotated PDF that is synced back to my Mac via Dropbox. The PDF then typically contains a lot of essential information for my research (highlighted text and notes) and also a lot of inessential information (the rest of the PDF). This is a problem. I do not want to read through the PDF every time to get the essential information, I want to extract it and save it for future reference.

People use Skim or iPad apps for extracting their notes, some even have their own scripts for this sole purpose. However, I did not like these solutions:

  • They require editing after extraction
  • They export to plain text
  • They sort the annotations by time (the order in which you made the annotations)
  • They do not handle image selections
  • They do not contain metadata about the PDF it was extracted from

This is why I developed Highlights:

  • The annotations are sorted by their placement on the page, so you can make them in the order you want
  • It translates PDF annotations to Markdown for minimal syntactic sugar
  • PDF square selections are extracted as images (PNGs)
  • It looks up metadata using DOI-links and uses as a header

Future Reference

The great thing about extracting PDF annotations is that you get a searchable library of the parts you find important along with your own notes. There is no lock-in, you can use any application to index your library: Spotlight, Devonthink or even a wiki solution like Gollum.

I have found a personal wiki solution like Gollum to be a game changer for linking my references in intersting ways.


  1. Remarks is no longer available in the App store, but PDF Expert from the same developer is a more advanced version of the same app.


comments powered by Disqus