Skip to content

Commit

Permalink
Merge pull request #203 from alliomeria/1.4.0
Browse files Browse the repository at this point in the history
More 1.4.0 Documentation New & Updates
  • Loading branch information
alliomeria authored Jul 3, 2024
2 parents 9afc44c + 86b0d9d commit 87ffaa5
Show file tree
Hide file tree
Showing 14 changed files with 319 additions and 19 deletions.
6 changes: 6 additions & 0 deletions docs/ami_spreadsheet_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ There are multiple ways a spreadsheet/CSV file can be structured to work with AM
- **Multiple files (of the same type) can be placed in a single cell, separated by a semicolon ( ; ).**
- For Digital Objects comprised of multiple types of files, such as an Oral History Interview with an audio file and a PDF transcript file, you can place different file types within different corresponding columns for the same Row.
- It is recommended that filepaths are copied/stored as plain (non-hyperlinked) formatted text.

!!! warning "Caution with using Quotation Marks"

Archipelago is very JSON-forward (friendly!) and makes assumptions about JSON encoding coming from CSV documents. If you are planning to start and end values with quotation marks in your CSV file, such as for `label` values, you need to enter the values encapsulated between the UTF-8 (unicode) character for straight quotations--in the CSV cell (corresponding row + column).
For example, in order for the ingested object to have the label (title) value processed as "Strawberry-Bed" for the ingested Archipelago Digital Object you need to enter the data as "\u0022Strawberry-Bed\u0022" in the corresponding `label` cell in your AMI Set CSV.


- **Every spreadsheet/CSV file should contain the following Columns:**
- `type`
Expand Down
26 changes: 13 additions & 13 deletions docs/ami_update.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,20 +80,13 @@ As with regular/Create New AMI Sets, you will have to select your preferred Data

Beginning from [Step 7, Processing](AMIviaSpreadsheets.md#step-7-ami-set-processing) of your AMI Set Configuration, select the Update operation that best corresponds to your targeted Update scenario.

![AMI Update Processing Step](images/ami_update_processing_step.png)
![AMI Update Processing Step](images/ami_update_processing_step_updated.png)

![AMI Update Type Options](images/ami_update_type_options.png)
![AMI Update Type Options](images/ami_update_type_options_updated.png)

### 1. Normal Update Operation
### 1. Replace Update Operation

The **Normal** Update Operation 'will update a complete existing ADO's configured target field with new JSON Content.' This will replace everything in an ADO with new processed data.

- The Normal update operation is powerful and can overwrite your whole JSON object record if not paired with a template that has all the extra checks/logic needed to preserve existing data if desired (see note of 'Caution with Templates for Data Transformation' above).
- It is also recommended to only use the Normal Update approach if you need to re-process most of the metadata fields for ADOs.

### 2. Replace Update Operation

The **Replace** Update Operation Replace 'will replace JSON keys found in an ADO's configured target field with new JSON content. Not provided ones (fields/JSON keys) will be kept.'
The **Replace** Update Operation Replace 'will replace JSON keys found in an ADO's configured target field(s) with new JSON values. Not provided JSON keys will be kept.'

- If the processed data contains a JSON key that is already in the ADO's metadata to be updated, the values in the AMI Update Set CSV will be used, replacing completely the values found in that key in the existing ADO.
- The Replace update operation paired with the 'Direct' data transformation is likely the update operation you will use.
Expand All @@ -108,9 +101,16 @@ The **Replace** Update Operation Replace 'will replace JSON keys found in an ADO
- You select the **Replace** update operation and keep 'Do not touch existing files' checked.
- With this setup, the new field and values are added to the existing JSON for the impacted ADOs.

### 2. Complete (All JSON keys) Update Operation

The **Complete (All JSON keys)** Update Operation (formerly labeled 'Normal' in previous Archipelago releases) 'will update a complete existing ADO's JSON data with all new JSON data.' This will replace **all the existing JSON, everything** in an ADO with new processed data.

- The Complete (All JSON keys) update operation is powerful and can overwrite your whole JSON object record if not paired with a template that has all the extra checks/logic needed to preserve existing data if desired (see note of 'Caution with Templates for Data Transformation' above).
- It is also recommended to only use the Complete (All JSON keys) approach if you need to re-process the majority of the metadata fields for ADOs.

### 3. Append Update Operation

The **Append** update operation 'will append values to existing JSON keys in an ADO's configured target field. New ones (fields/JSON keys) will be added too.'
The **Append** update operation 'will append values to existing JSON key(s) in an ADO's configured target field(s). New JSON keys will be added too.'

- If the processed data contains a key that is already in the ADO’s metadata to be updated, attempts will be made to match the "source" type (array, complex object) to add to it. If you have 2 values in a key, and your original/existing data contains a single value, the result will have 3 values (and then it will try to deduplicate too). If the Source data did not contain a key present in the processed data, then it will be added.
- The Append operation can be very useful, but it should be used with caution if targeting single values versus arrays. AMI will not permit malformed JSON data to be generated. **But** you need to consider if your Append update tranforms a previously single-value key into a multiple-value array, how this change may impact any references made in you display or other templates, Views throughout your Archipelago.
Expand All @@ -124,4 +124,4 @@ ___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.

Return to the [Archipelago Documentation main page](index.md).
Return to the [Archipelago Documentation main page](index.md).
61 changes: 61 additions & 0 deletions docs/experimental_tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Experimental Machine Learning (ML) Tools
tags:
- Experimental ML Tools
- Machine Learning
- ML
---

# Experimental Machine Learning (ML) Tools

Archipelago 1.4.0 Local Deployment Release was shipped with a set of experimental Machine Learning (ML) Tools for testing and assessing the applicable use of such tools/workflows within a general repository environment. These tools should not be considered 'final product' integrations, and should not be exposed to end users--there are intentional configurations in place that limit exposure of these endpoints to authenticated users only.

Please review [Allison & Diego's recent 2024 Conference Presentations](presentations_events/#2024) related to this important topic in our shared field for a fuller understanding of our team's ethical perspectives and the considerations we keep related to ML tools and applications.

## What's under the hood enabling these tools?

- New Archipelago ML supporting code (lots of maths!)
- New Strawberry Runners Post-Processing pipelines for Image and Text Vectorization
- New Solr Fields for storing the vectorized image or text fragments
- New Views and contextual and exposed filters for enabling search and results from Solr
- New UI Interfaces for interacting with these tools

### 1. Experimental Image ML comparison (integrated)

As a logged in user, you can find the 'Image KNN Similarity Search' tool:

- Through the `Tools` menu > `ML Image Similarity Search`
- Directly at `/search_ml_images`

Uses these pre-trained models:

- YOLO v8 (Structural/Layout)
- MobileNet V3 (Pattern, scale agnostic)
- InsightFace (Face feature encoding)

Also uses IIIF Content Search API Object detection of Integrated Annotations

#### How could I test this?

Conduct a search for an image in your Archipelago repository, then click on one of the images found in the results set. In the section below the top search results, you can review the output of the Image Similarity searches. You could also select a particular image Annotation to use for the Image Similarity search and comparison.

### 2. Experimental Text Similarity Search

As a logged in user, you can find the 'Sbert KNN search on OCR-ed Pages Content' tool:

- Through the `Tools` menu > `ML Text Similarity Search`
- Directly at `/search_sbert/`

Uses SBert/384 Vector Size embedding with assigned Task on short sentences over uncorrected OCR

#### How could I test this?

Conduct a search for a particular text fragment, written in plain language style, such as "Show me resources related to dogs". In the section below the text-based search, you can review the output of the Text Similarity searches. You will only see matches against documents that have OCR values.

## Into the future

We would like to reiterate that these are early explorations of ML tools within the Archipelago repository architecture. Expect changes, enhancements, and even removal of certain aspects of these experimental tools in future releases.

__

Return to the [Archipelago Documentation main page](index.md).
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ami_update_type_options_updated.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/metadata-api-oai-pmh-form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/metadata-api-page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/oai_pmh_example_snapshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/oai_pmh_metadata_display.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/oai_pmh_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion docs/inthewild.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ From all around our beautiful shared world. 🏡 🏫 🏛️
- [Amherst College](https://acdc.amherst.edu)

- [Association Montessori Internationale](https://montessori-ami.org/)
- Development of Archipelago environment began Summer 2022; Launch of new site Spring 2024
- Development of Archipelago environment began Summer 2022; Launch of new site late 2024

- [California Revealed](https://repository.californiarevealed.org/)

Expand All @@ -64,8 +64,12 @@ From all around our beautiful shared world. 🏡 🏫 🏛️
- [http://archipelago.byterfly.eu/](http://archipelago.byterfly.eu/) 🦋
- [Virtual Tour Santuario Paola](http://archipelago.byterfly.eu/do/5aea0a3f-cf03-40cc-9611-924dea1fd806)

- [Universidad Nacional Autónoma de México](https://archivodigital.iibi.unam.mx/es)

- [University of Edinburgh Libraries](https://www.ed.ac.uk/information-services/library-museum-gallery)
- _*Development of Archipelago environment began late 2022/3_

- [Vipassana Treasures](https://tod.vridhamma.org/)

## We should be here

Expand Down
Loading

0 comments on commit 87ffaa5

Please sign in to comment.