A prototype web application for labeling/annotating PDF files stored as SVG documents (see iesl-pdf-to-text for how they're generated). It has a bare-bones UI and layout (i.e., no Bootstrap or equivalent) and no production-level features such as authentication, authorization, document owners and assignments, error handling, concurrent access, deployment, etc. There is one unit test - SvgFileDAOTest. Also missing is a comprehensive set of guidelines that include marked up examples that cover all possibly confusing cases.
Download the Typesafe Activator and run:
$ cd <repos>/playframework_temp/
$ <activator_location>/activator-dist-1.3.5/activator run
It should compile the necessary files and then start a local server at localhost:9000.
- Front end: Fabric.js Javascript Canvas Library for the rectangles.
- Back end: Play Framework for the web application, using the Java API.
- Eventually: xml_annotator for controllers.Application.getDocRectText()
Currently implemented are two user Play routes: an index at '/' that shows all SVG files in the SvgFileDAO.svgDirectory, and an SVG editor of individual files at 'docs/<svg_filename>'.
Editor operations currently supported:
- load and display saved annotations using color-coded rectangles
- create/select/edit/delete/filter rectangles using mouse or shortcut keys
TODO:
- abstract out annotation types and colors in edit.scala.html (currently hardcoded). see code TODO's as well
- add annotation types to all desired header fields: abstract (including "abstract" section header), address,
author (including first__, middle, __last), date, editor, email, institution, keyword
section (whole section including "keywords" section header), note, tech, title.
- Q: publication info?
- Q: web?
- Q: is the entire header itself to be marked?
- add shortcut key for addLinkBetweenTwoRects() and deleteLinkBetweenTwoRects()
- shortcut keys should be enabled/disabled same as buttons - see updateButtonStates()
- style the interface using Bootstrap or similar
- add document paging to the index
Currently using a filesystem-based scheme for storing the SVG files and their corresponding JSON files in the same directory. The program creates empty JSON files for any missing ones when starting up.
TODO:
- controllers.Application.getDocRectText(). Requires modifying iesl-pdf-to-text to expose font size information so that each char's bounding box can be calculated. Currently there's a common bug where font-size="1px".
There are two known issues where the SVG files do not show correctly in browsers:
- Some files have a black background, such as 0158.pdf.svg in the MIT corpus.
- In Firefox 39.0, some files have overlapping text, such as 4789.pdf.svg in the MIT corpus. Chrome 44 renders correctly.
The UI is a straightforward direct manipulation one where users work with rectangle objects. Click to select, drag to move, drag resize handles to resize, click the delete rectangle button to remove, etc. The only feature that's non- obvious is how to add and remove links between rectangles. To add a link, select exactly two rectangles with the same label and no existing link and then click the add link button. To remove a link, select two rectangles with an existing link and the click the remove link button.
1: set current type to Title shift + 1: filter all but Title 2: set current type to Abstract shift + 2: filter all but Abstract 3: set current type to Author shift + 3: filter all but Author Escape: reset filter
+: create new annotation using current type
arrow key: move selection 1px shift + arrow key: move selection 10px option + arrow key: resize selection 1px shift + option + arrow key: resize selection 10px
backspace|delete: delete selection control + d: duplicate selection
tab: select next shift + tab: select previous
x: display text for selection