Skip to content

Overview of Conversion and Encoding Process

Benjamin Grey edited this page Apr 23, 2024 · 5 revisions

The Big Picture: Scope and Approach for DHQ Encoding

DHQ uses a customization of the TEI Guidelines to encode all of our journal articles. The DHQ customized schema focuses largely on features that are natural features of journal articles: metadata, headings, paragraphs, quotations, lists, figures, tables, examples, notes, citations. In addition, we include a few specialized features that are distinctive to DHQ, such as specialized encoding to embed visualizations or working program code. Our broad encoding philosophy is:

  • to focus the encoding on the function of specific textual features, and to handle the styling and presentation of those features through stylesheets. As a result, the encoding is generally very consistent from article to article, and there aren't many opportunities for fine-tuning the appearance of the article, or controlling things like precise positioning of images.
  • to focus on features that will either support the fundamental requirements of making the article legible, or support some potential analysis (such as a study of citation patterns).

Finding an Article to Encode

DHQ's encoding is tracked in the Table of Contents file (tox.xml), which is stored in the .../toc directory. Near the top of that file is a section titled ENCODING QUEUE, followed by a list of articles that are ready to be encoded (and a list of Special Issue articles occasionally). When an article is accepted for publication in OJS, it is added (by whoever accepts it) to the encoding queue. When an article enters the encoding process, its entry in the encoding queue is moved to the ENCODING IN PROGRESS & INTERNAL PREVIEW section of the toc and the encoder completes the steps in the section below.

Workflow for Encoding

(Before beginning, make sure that you have enough time to complete Steps 1–3 in one sitting.)

1. Consult the TOC

Before encoding any article, you should open the TOC and locate the ENCODING IN PROGRESS & INTERNAL PREVIEW section.

  • In this file is where articles are given their unique article ID numbers. These are always six digits long (including preceding zeros) and should increase sequentially without skipping any numbers.
  • Look through the entire section, find the highest current value, add one, and make note of this new ID number somewhere. (DO NOT edit the TOC yet).

2. Create a New Article Directory and XML File

In the .../articles directory, create a folder and name it the new ID number you've just made note of (e.g. 000789).

There are two ways to create the initial XML file for DHQ article. When first encoding, we recommend following the instructions to encode by hand because it gives a greater exposure to markup in XML. For more regular encoding, follow the instructions to encode with the conversion utility.

Encoding by Hand

  • Open the dhq-tei_template.xml file (in the .../article/templates directory)
  • Save a copy into your new folder with the appropriate filename (e.g. 000789.xml)
  • Open the file with Oxygen and navigate to the <publicationStmt> element
  • Find this element <idno type="DHQarticle-id"> and add the new ID number after the ">" (you can delete the comment if desired)
  • Save the file again
  • Do not continue encoding yet, proceed to Step 3

Encoding with Conversion Utility

  • Navigate to the TEIGarage automated conversion utility
  • Download the article manuscript from OJS, and upload it to the TEIGarage interface: click "Documents" and choose the appropriate input format (probably Microsoft Word .docx) and output format (TEI P5 XML Document), and then upload the article manuscript and click "Convert". TEIGarage will download the converted file to your computer.
  • Open the converted XML file in Oxygen, and then run the "convert_tei2dhq" transformation scenario. (This will remove a lot of unnecessary markup and add the DHQ TEI header and metadata boilerplate. If the author has used the DHQ MS Word authoring template, this transformation will create markup for common elements including div, head, emph, term, and title. If the article includes embedded Zotero references, this transformation will also create encoding for pointers and bibliographic citations.)

After running this transformation scenario, return to the "wrench and play button" icon and un-check the checkbox next to it, click save and close. If you don't un-check this option, you may accidentally overwrite your own work later (ask me how I know!)

  • Save the converted file into your new folder with the appropriate filename (e.g. 000789.xml)
  • Navigate to the <publicationStmt> element
  • Find this element <idno type="DHQarticle-id"> and add the new ID number after the ">" (you can delete the comment if desired)
  • Save the file again
  • Do not begin encoding yet, proceed to Step 3

3. Edit the TOC

Return to the TOC, locate the ENCODING IN PROGRESS & INTERNAL PREVIEW section and complete the following steps.

  • Create a new empty <item/> element with an @id attribute whose value is the new article's ID number (e.g. <item id="000789"/>)
  • Add a comment next to the <item/> element which contains the author's last name, the article's OJS number, and your name with an encoding status
  • Save and close the TOC file
  • Commit your changes and push to GitHub
  • (Optionally) send a message on Slack letting other encoders know you've claimed a new article ID

4. Encode your Article

Stop, did you complete Step 3 and push your changes to GitHub yet?

If you are encoding by hand, you will need to download the article manuscript from OJS and open it. You will use this to copy and paste content into XML elements as you encode the article. If you are encoding with a converted file, you still will likely find it very handy to open the original manuscript file.

The Markup Process

Whichever approach you used to create the initial file, the goals of the markup process are the same, but the actual work flow may be different:

  • If you are hand-encoding the article, it makes sense to first set up the overall structure of the document with <div> elements representing the major sections. During this process you can make sure that the structure of sections and subsections is accurately represented, with <div> nested inside <div> (or even further <div> layers!) as needed. Some articles use clear and consistent heading structures that make this easy. With others, you may need to read the article carefully to make sure you're getting the nesting correct. Once you have the overall structure in place, you can then work through from the start of the article, pasting in paragraphs and quotation blocks, and creating figures and tables as you come to them. Be careful as you work to encode any words in italics or in quotation marks with the appropriate element (see detailed encoding guidelines below and consult the encoding documentation for full information).
  • If you have converted the article using TEIGarage, the encoding of <p> will be in place, and some of the overall markup structure of <div> may also be in place, and other features such as figures and tables may be correctly encoded as well. However, you will need to check the <div> structure carefully to make sure that TEIGarage has interpreted the headings correctly. In some cases, there may be no <div>s at all, and the headings may be represented as paragraphs. In addition, phrase-level features (like titles, emphasis, and italicized terms) will need to be reviewed and encoded according to DHQ's guidelines..

Validation

As you encode, you should validate your work at intervals. The article encoding template includes a schema reference at the top (line 2) so you can validate by typing Command+shift+v (Mac) or Control+shift+v (Windows/Linux). At first, you'll see a lot of error messages, and this is a good sign: it means that Oxygen is reading the schema and your file is saved in the right place.

If you don't see any error messages at all, navigate to OptionsPreferences and type "idref" in the Filter Text search bar.

If the option for Check ID/IDREF is selected, deselect it and select Apply. Your file should now begin to be validated against the schema.

If you see an error message that begins "java.io.FileNotFoundException", that means that the link to the schema is broken: either because you've saved the file someplace other than the articles directory, or because the link in line 2 has been deleted or altered. Read the error messages and try to figure out what they're telling you; don't just ignore them. (An error message crib sheet will be available shortly.) When you've finished encoding the article, there should be no error message relating to the body of the article; there will still be some missing items in the article metadata, relating to the final publication of the article.

Proofreading

Responsibility for proofreading the article is shared between the author and the managing editor. As you encode the article, if you see typographical errors, missing words, or other obvious problems, you should fix them as you go. If the article seems to have problems that require input from the author, keep notes on these and include them when you email the author at the end of the process. If there are problems that prevent you from finalizing the encoding (for instance, ambiguous division structures), you can email the author for clarification.

Finishing Up

Once the article has been fully encoded, there are a few final steps to move it into publication:

  1. Check your encoding against the encoding checklist to make sure you've covered everything.
  2. Use the DHQproof transformation scenario to convert the article to HTML on your local computer, and scan through it to make sure that it matches the original manuscript. Check that the heading structure is correct, that all figures and tables are present and look OK, that images are displaying properly, and that all bibliographic references have been encoded. Fix any problems you see. This is also another opportunity to notice and fix typographical errors.
  3. Add a link to the article in the internal preview area of the toc.xml file if one doesn't already exist.
  4. Request a server update, which will make the article visible in our internal preview space.
  5. Email the author to let them know the article is ready for their review; include any questions you've accumulated.
  6. Make any changes requested by the author; if they're substantial, repeat steps 3 and 4 so that the author can confirm their changes have been made successfully.
  7. Let the Publication Kwelbo know that the article is ready to be put into public preview.

Next Steps...

Once you've read this initial orientation, you're probably ready to take a look at an Anatomy of a DHQ Article, which will give you a big-picture view of how we encode our articles, with links to the DHQ encoding documentation.