Skip to content

Latest commit

 

History

History
227 lines (138 loc) · 13.3 KB

README.md

File metadata and controls

227 lines (138 loc) · 13.3 KB

Creating JUXTA Collages with Web Archive Images

Example JUXTA Collage Using Web Archive Image Data

Welcome! The following tutorial provides instructions for building an image collage using Juxta and a image information dataset generated through the Archives Research Compute Hub (ARCH) platform.

Table of Contents

Acknowledgements

We recognize the following work and contributions which have made this tutorial possible.

Toke Eskildsen is the creator of the Juxta shell script.

The following tutorial was collaboratively designed by Nick Ruest and Samantha Fritz. Many thanks to Ian Milligan for his testing and editorial support.

The Google Colab notebook was built by Nick Ruest, with datasets examples generated using the Archives Research Compute Hub (ARCH).


Overview

Web archive data is a rich source for studying the recent past. Web archives preserve a variety of information formats, including the full text of websites to image and video information, to network links among websites in a collection, and as such, offer a plethora of opportunities to explore web archive data.

The following tutorial will outline how to create a Juxta collage using an image dataset generated through the Archives Research Compute Hub (ARCH). By transforming image data, researchers have an opportunity to explore a web archive collection interactively.

What is Juxta?

Juxta is a shell script which generates a collage of images for display on a webpage. You can learn more about the Juxta script through its GitHub page: https://github.com/tokee/juxta.

Considerations

Web archive data tends to be quite large and often outpaces the capacity of local storage on your computer. You may want to consider working with dedicated storage (HDD, SSD) or servers.

This tutorial was created using macOS.

Pre-requisites

To complete this tutorial, you will need three things:

  • The ARCH “image information” derivative
  • The “ImageMagick” package. To check if it is installed, run convert -v in your console.
  • The “jq” package. To check if it is installed, run jq in your console.

On Mac OS, both ImageMagick and jq can be installed using brew

  • brew install imagemagick and brew install ghostscript for ImageMagick
  • brew install jq for jq

Back to ToC


Creating a Juxta Image Collage

1. ARCH - Run Image Information job

Within ARCH, select a collection and generate a new dataset from the "File Formats" category called "Extract Image Information".

Run ARCH dataset

Back to ToC

2. Copy the derivative URL

Once the dataset has been generated, click on "View Dataset" to navigate to the summary page. Scroll down to the download icon, right-click and select “copy link.” This will be the dataset URL needed for working with images in the notebook.

Right click on download to copy dataset URL within ARCH

Back to ToC

3. Working with Google Colab

Create a copy of the Image Information Download Urls notebook via Google Colab.

Copy Colab Notebook

Working from the copied notebook now, you will start off by changing the title. You may find it easiest to note the collection number in the title if you plan to work with multiple copies of the template.

There are a few cells that will need a change in information.

  1. In the first cell, change the URL listed to the URL of the image information dataset we copied in the previous step. The curl command is used to transfer data to and from a server. In this case, the notebook calls out to the extracted image information dataset from ARCH.

  2. In cell six, which identifies the Wayback URL, change the collection id to match the collection we are currently working with.

  3. Finally, in the last cell, change the collection id in the CSV title. This title could be anything meaningful to you as a researcher, but we suggest maintaining consistency using the collection id.

Change collection information in notebook

Located at the top, click on the Runtime menu and select Run All. Alternatively, you can manually click on each play button. The pre-scripted actions in this notebook will ultimately generate a .txt file with formatted image URLs, which can then be used to fetch and download the images from this web archive collection using a single command line function.

Runtime in notebook

In the right-hand pane, which is collapsed by default, click on the file folder icon and download the .txt file to either your desktop or a server.

Download csv from notebook

Create a directory (folder) to house the recently downloaded .txt file. For this example, a directory called 13709Juxta was created on a local desktop.

Next, create a subfolder in the new directory and call it images.

Our example path to our main working directory looks like this: /Users/fritz/Desktop/13709Juxta

Using your terminal window, navigate to the images directory. Then use the following command to download all the images from the text file to the images folder. This image directory will be used to create the Juxta collage.

wget --random-wait -i ../13709_image_urls.txt

NOTE: Downloading the images will take time! Do not close your terminal window.

Back to ToC

4. Clone Juxta

Navigate to https://github.com/tokee/juxta. Use the URL provided under the green "Code" button to clone.

Clone Juxta within your main directory by using the following command

git clone https://github.com/tokee/juxta.git

Note your path to Juxta; for simplicity's sake in this example, we’ve cloned Juxta to our main working directory /Users/fritz/Desktop/13709Juxta

Back to ToC

5. Create .dat file

A .dat file is a “generic data file that contains important information about the program used to create the particular file.”1 For the purposes of generating a Juxta image collage, we will be converting the jpg image files downloaded from the replay URLs and redirecting the output 2 as a .dat file format.

From your terminal, navigate to be one directory above where the images are saved.

For instance: /Users/fritz/Desktop/13709Juxta

Run the following command to find the images and redirect the output to a .dat file

find images > images.dat

Back to ToC

6. Create collage

Next, will create all of the files and tiles needed to view the collage in a web browser.

We are creating a new directory for all of the files. You will need to make a few modifications to the command below:

THREADS=4 /Users/fritz/Desktop/13709Juxta/juxta/juxta.sh images.dat example

Here’s a quick breakdown of what this line of code does:

Code Snippet Code Functionality
THREADS=4 You may need to change the number of threads. This example opts for four threads, with a total of 8 cores available. As this tutorial uses local computer storage, changing the threads ultimately means the laptop is used for processing JUXTA files but can continue to be used for other work.
/Users/fritz/Desktop/13709Juxta Change to the path of where you’ve cloned Jutxa.
example This will be the name of the directory in which Juxta formatted files are created. You can change this to whatever makes sense for you, but be sure to avoid spaces.

Before launching a local web server, navigate to the example directory created with the command above.

cd example

Back to ToC

7. Launch web server

Now that all the files have been created, we will use Python to serve files from a local directory via HTTP. This will allow you to display and explore the image collage through the web browser. Type the following into terminal:

python3 -m http.server

To launch the server, enter the local host address as a URL in a browser of your choosing.

Localhost:8000

Back to ToC

Feedback

As this is intended to be a stand-alone resource, please let us know how we can improve the experience of using this tutorial, through our feedback survey.

License

CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

References

Footnotes

  1. Otachi, Elsie. (2020). "How to Read and Open .DAT Files in Windows"https://www.online-tech-tips.com/computer-tips/how-to-open-dat-files/

  2. "Using COmmand Symbols" https://sourcedaddy.com/windows-7/using-command-symbols.html