Devilfish is an archiving utility for use with DEVONthink Pro. It simultaneously creates a single-page PDF snapshot of a web page and a local web archive of the page, and sends requests to external sites such as the Internet Archive.
Author: Michael Hucka
Repository: https://github.com/mhucka/devilfish
License: Unless otherwise noted, this content is licensed under the MIT License license.
Web pages are ephemeral—here today, gone (or worse, changed) tomorrow. For researchers, this is anathema: we need to be able to document exactly what we read, when we read it, and potentially prove it at a later time. Currently the best web archiving facilities for general use are sites such as the Internet Archive, WebCite, and Archive.today, but for convenience and rapid access, keeping one's own local archives is a necessity. One of the research tools I use is DEVONthink Pro, a personal database and information management system for macOS, and I needed a convenient way to store not only an archive of a web page but also a page snapshot in PDF format. Devilfish is my solution.
Devilfish is meant to be bound to a keyboard shortcut and invoked while browsing the web in Safari or Google Chrome. When invoked, it does the following:
- Prompts the user for a destination database in DEVONthink Pro and for a list of tags
- Calls on DEVONthink Pro to create an archive of the current page in webarchive format
- Calls on DEVONthink Pro to create a single-page PDF of the current page
- Optionally, sends requests via network API to the Internet Archive, WebCite, and Archive.today
The web archive is not stored in DEVONthink Pro but rather in a folder in the user's home directory. The PDF is left in DEVONthink; the URL of the web page is stored in the document's URL field, and the PDF is annotated with a Spotlight comment containing the path to the (external) web archive file. This combination avoids duplication and excessive growth in the user's DEVONthink database, while still allowing the user to take advantage of DEVONthink's powerful full-text PDF search, annotation, and other capabilities, and to have a backup copy of the original page source as a precaution. The web archive storage location can be placed on an external drive, or an IPFS location, or other location.
The name "Devilfish" for this software is inspired by loosely combining "DEVONthink" and "fishing", as in fishing for information. (By the way, the real devil fish—more properly known as Mobula mobular or the giant devil ray—is an endangered species due to fishing and habitat destruction. Please read more about them to become more informed and help preservation efforts before they are driven to extinction.)
If you find an issue, please submit it in the GitHub issue tracker for this repository.
I would be happy to receive your help and participation if you are interested. Everyone is asked to read and respect the code of conduct when participating in this project.
Copyright (c) 2018 by Michael Hucka and the California Institute of Technology.
The image of the illustration of a giant devil ray used on this page came from Wikimedia. It was originally created by H. Gervais for the 1877 book Les Poissons by H. Gervais and R. Boulart.