Skip to content

Commit

Permalink
Merge pull request 'Release v24.07' (#5) from release_24.07 into master
Browse files Browse the repository at this point in the history
  • Loading branch information
janvonde committed Aug 20, 2024
2 parents 5fcdb72 + f77404c commit 2090d20
Show file tree
Hide file tree
Showing 20 changed files with 331 additions and 91 deletions.
84 changes: 39 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,39 @@
Goobi workflow Plugin: goobi-plugin-workflow-liechtenstein-volksblatt-importer
===========================================================================

<img src="https://goobi.io/wp-content/uploads/logo_goobi_plugin.png" align="right" style="margin:0 0 20px 20px;" alt="Plugin for Goobi workflow" width="175" height="109">

This workflow plugin enables the mass import of individual newspaper editions for the Liechtensteiner Volksblatt

This is a plugin for Goobi workflow, the open source workflow tracking software for digitisation projects. More information about Goobi workflow is available under https://goobi.io. If you want to get in touch with the user community simply go to https://community.goobi.io.


Plugin details
---------------------------------------------------------------------------

More information about the functionality of this plugin and the complete documentation can be found in the central documentation area at https://docs.goobi.io

Detail | Description
--------------------------- | -------------------------------
**Plugin identifier** | intranda_workflow_liechtenstein_volksblatt_importer
**Plugin type** | Workflow Plugin
**Licence** | GPL 2.0 or newer
**Documentation (German)** | - no documentation available -
**Documentation (English)** | - no documentation available -


Goobi details
---------------------------------------------------------------------------
Goobi workflow is an open source web application to manage small and large digitisation projects mostly in cultural heritage institutions all around the world. More information about Goobi can be found here:

Detail | Description
------------------- | --------------------------
**Goobi web site** | https://www.goobi.io
**Twitter** | https://twitter.com/goobi
**Goobi community** | https://community.goobi.io


Development
---------------------------------------------------------------------------
This plugin was developed by intranda. If you have any issues, feedback, question or if you are looking for more information about Goobi workflow, Goobi viewer and all our other developments that are used in digitisation projects please get in touch with us.

Contact | Details
----------------- | ----------------------------------------------------
**Company name** | intranda GmbH
**Address** | Bertha-von-Suttner-Str. 9, 37085 Göttingen, Germany
**Web site** | https://www.intranda.com
**Twitter** | https://twitter.com/intranda
# Goobi workflow Plugin: goobi-plugin-workflow-newspaper-pages-importer

<img src="https://goobi.io/wp-content/uploads/logo_goobi_plugin.png" align="right" style="margin:0 0 20px 20px;" alt="Plugin for Goobi workflow" width="175" height="109">

This Workflow plugin for Goobi workflow allows a mass import of newspaper issues in the form of individual pages, in which the date, issue numbers and page numbers are transferred.

This is a plugin for Goobi workflow, the open source workflow tracking software for digitisation projects. More information about Goobi workflow is available under https://goobi.io. If you want to get in touch with the user community simply go to https://community.goobi.io.

## Plugin details

More information about the functionality of this plugin and the complete documentation can be found in the central documentation area at https://docs.goobi.io

Detail | Description
--------------------------- | ----------------------
**Plugin identifier** | intranda_workflow_newspaper_pages_importer
**Plugin type** | workflow
**Licence** | GPL 2.0 or newer
**Documentation (German)** | https://docs.goobi.io/workflow-plugins/v/eng/workflow/goobi-plugin-workflow-newspaper-pages-importer
**Documentation (English)** | https://docs.goobi.io/workflow-plugins/v/ger/workflow/goobi-plugin-workflow-newspaper-pages-importer

## Goobi details

Goobi workflow is an open source web application to manage small and large digitisation projects mostly in cultural heritage institutions all around the world. More information about Goobi can be found here:

Detail | Description
--------------------------- | ---------------------------
**Goobi web site** | https://www.goobi.io
**Goobi community** | https://community.goobi.io
**Goobi documentation** | https://docs.goobi.io

## Development

This plugin was developed by intranda. If you have any issues, feedback, question or if you are looking for more information about Goobi workflow, Goobi viewer and all our other developments that are used in digitisation projects please get in touch with us.

Contact | Details
--------------------------- | ----------------------------------------------------
**Company name** | intranda GmbH
**Address** | Bertha-von-Suttner-Str. 9, 37085 Göttingen, Germany
**Web site** | https://www.intranda.com
107 changes: 107 additions & 0 deletions docs/index_de.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: Import von Zeitungsausgaben als Einzelseiten
identifier: intranda_workflow_newspaper_pages_importer
description: Dieses Workflow Plugin erlaubt einen Massenimport von Zeitungsausgaben in Form von Einzelseiten, bei dem Datum, Ausgabennummern und Seitenzählungen übernommen werden.
published: true
---

## Einführung
Dieses Workflow-Plugin erlaubt einen Massenimport von Zeitungsausgaben, die als Einzelseiten vorliegen. Für jede in einem Ordner vorliegende Datei wird dabei anhand des Dateinamens das Ausgabendatum sowie die Ausgabennummer ermittelt. Anschließend werden Goobi-Vorgängen auf Jahresebene erzeugt und die Ausgaben samt Metadaten und Seitenzugehörigkeiten erzeugt.

## Installation
Zur Installation des Plugins müssen folgende beiden Dateien installiert werden:

```bash
/opt/digiverso/goobi/plugins/workflow/plugin-workflow-newspaper-pages-importer-base.jar
/opt/digiverso/goobi/plugins/GUI/plugin-workflow-newspaper-pages-importer-gui.jar
```

Um zu konfigurieren, wie sich das Plugin verhalten soll, können verschiedene Werte in der Konfigurationsdatei angepasst werden. Die Konfigurationsdatei befindet sich üblicherweise hier:

```bash
/opt/digiverso/goobi/config/plugin_intranda_workflow_newspaper_pages_importer.xml
```

Für eine Nutzung dieses Plugins muss der Nutzer über die korrekte Rollenberechtigung verfügen.

![Ohne korrekte Berechtigung ist das Plugin nicht nutzbar](screen1_de.png)

Bitte weisen Sie daher der Gruppe die Rolle `Plugin_workflow_newspaper_pages_importer` zu.

![Korrekt zugewiesene Rolle für die Nutzer](screen2_de.png)


## Überblick und Funktionsweise
Wenn das Plugin korrekt installiert und konfiguriert wurde, ist es innerhalb des Menüpunkts `Workflow` zu finden.

![Geöffnetes Plugin für den Import der Zeitungsausgaben](screen3_de.png)

Nach dem Betreten des Plugins kann der eigentliche Importvorgang gestartet werden. Hierbei wird innerhalb des konfigurierten Quellverzeichnisses nach vorhandenen Dateien gesucht und deren Namen überprüft. Das Benennungsschema innerhalb des Dateinamens muss dafür folgendermaßen aussehen:

```bash
yyyy-MM-dd_AAA.bbb
```

Die Zeichen stehen dabei für das folgende:

Zeichen | Erläuterung
---------|----------------------------------------
`yyyy` | Angabe des vierstelligen Jahres
`MM` | Angabe des zweistelligen Monats, ggf. mit führender Null
`dd` | Angabe des zweistelligen Tages, ggf. mit führender Null
`AAA` | Numerische Ausgabennummer in drei Stellen, ggf. mit führenden Nullen
`bbb` | Dateiendung, wie z.B. `pdf`, `jpeg` oder `tif`

Beispielhaft ein Verzeichnislisting für einen solchen Ordnerinhalt:

```bash
tree /opt/digiverso/import
/opt/digiverso/import
├── 1867-04-06_001.pdf
├── 1867-04-06_002.pdf
├── 1867-04-06_003.pdf
├── 1867-04-06_004.pdf
├── 1867-04-20_001.pdf
├── 1867-04-20_002.pdf
├── 1867-04-20_003.pdf
├── 1867-04-20_004.pdf
├── 1867-05-04_001.pdf
├── 1867-05-04_002.pdf
├── 1867-05-04_003.pdf
├── 1867-05-04_004.pdf
├── 1867-05-11_001.pdf
├── 1867-05-11_002.pdf
├── 1867-05-11_003.pdf
├── 1867-05-11_004.pdf
├── 1867-05-18_001.pdf
├── 1867-05-18_002.pdf
├── 1867-05-18_003.pdf
├── 1867-05-18_004.pdf
├── 1867-05-25_001.pdf
├── 1867-05-25_002.pdf
├── 1867-05-25_003.pdf
├── 1867-05-25_004.pdf
```

![Nutzeroberfläche nach Durchführung des Imports](screen4_de.png)

Während der Durchführung des Imports werden in Goobi für jedes Jahr ein Vorgang erzeugt, worin für jede Zeitungsausgabe jeweils ein Strukturelement mit den zugehörigen Daten, die aus den Dateinamen sowie aus den Werten der Konfiguration erzeugt wird.

![Erzeugte Zeitungsausgaben mit den zugehörigen Metadaten](screen5_de.png)


## Konfiguration
Die Konfiguration des Plugins erfolgt in der Datei `plugin_intranda_workflow_newspaper_pages_importer.xml` wie hier aufgezeigt:

{{CONFIG_CONTENT}}

Parameter | Erläuterung
-------------------------|----------------------------------------
`importFolder` | Mit diesem Parameter wird das Verzeichnis festgelegt, aus dem die Daten importiert werden sollen.
`workflow` | Dieser Parameter definiert den Namen der Produktionsvorlage von Goobi, auf dessen Basis die Vorgänge erzeugt werden sollen.
`processtitle` | Legen Sie hier fest, wie der Titel der anzulegenden Vorgänge lauten sollen. Ihnen wird beim Erzeugen der Vorgänge die Jahreszahl angefügt (z.B. `New_York_Times_123456789`).
`pageNumberPrefix` | Sollen den Seiten in der arabischen Paginierung ein Präfix vorangestellt werden, kann dieser hier definiert werden (z.B. `Seite`).
`languageForDateFormat` | Legen Sie hier die Sprache fest, die für die Generierung der Ausgabentitel verwendet werden soll (z.B. `de` oder `en`).
`issueTitlePrefix` | Soll vor dem ausführlichem Datum als Titel der Zeitungsausgaben ein Präfix vorangestellt werden, kann dieser hier angegeben werden (z.B. `Ausgabe vom`).
`deleteFromSource` | Im Fall, dass die zu importierenden Dateien nach dem Import aus dem Importverzeichnis gelöscht werden sollen, kann dies hier festgelegt werden.
`metadata` | Mit diesen Elementen kann festgelegt werden, welche Metadaten auf Zeitungs- und auf Bandebene für die anzulegenden Vorgänge eingesetzt werden sollen. Aus jedem hier angegebenen Element wird dabei ein eigenständiges Metadatum erstellt. Es akzeptiert sechs Attribute, wobei `value` und `type` obligatorisch sind, während `var`, `anchor`, `volume` und `person` optional sind. Weitere Einzelheiten finden sich in den Kommentaren innerhalb der Beispielkonfiguration.
107 changes: 107 additions & 0 deletions docs/index_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: Import of newspaper issues as single pages
identifier: intranda_workflow_newspaper_pages_importer
description: This workflow plugin allows a mass import of newspaper issues in the form of individual pages, in which the date, issue numbers and page numbers are transferred.
published: true
---

## Introduction
This workflow plugin allows the mass import of newspaper issues that are available as individual pages. For each file in a folder, the issue date and issue number are determined based on the file name. Goobi processes are then created at year level and the issues are generated together with metadata and page associations.

## Installation
To install the plugin, the following two files must be installed:

```bash
/opt/digiverso/goobi/plugins/workflow/plugin-workflow-newspaper-pages-importer-base.jar
/opt/digiverso/goobi/plugins/GUI/plugin-workflow-newspaper-pages-importer-gui.jar
```

To configure how the plugin should behave, various values can be adjusted in the configuration file. The configuration file is usually located here:

```bash
/opt/digiverso/goobi/config/plugin_intranda_workflow_newspaper_pages_importer.xml
```

To use this plugin, the user must have the correct role authorisation.

![The plugin cannot be used without correct authorisation](screen1_en.png)

Therefore, please assign the role `Plugin_workflow_newspaper_pages_importer` to the group.

![Correctly assigned role for users](screen2_en.png)


## Overview and functionality
If the plugin has been installed and configured correctly, it can be found under the 'Workflow' menu item.

![Open plugin for importing newspaper editions](screen3_en.png)

After entering the plugin, the actual import process can be started. This involves searching for existing files within the configured source directory and checking their names. The naming scheme within the file name must look like this:

```bash
yyyy-MM-dd_AAA.bbb
```

The characters are in favour of the following:

Character | Explanation
---------|----------------------------------------
`yyyy` | Specification of the four-digit year
`MM` | Specification of the two-digit month, with leading zero if necessary
`dd` | Specification of the two-digit day, with leading zero if necessary
`AAA` | Numerical issue number in three digits, with leading zeros if necessary
`bbb` | File extension, e.g. `pdf`, `jpeg` or `tif`

An example of a directory listing for such a folder content:

```bash
tree /opt/digiverso/import
/opt/digiverso/import
├── 1867-04-06_001.pdf
├── 1867-04-06_002.pdf
├── 1867-04-06_003.pdf
├── 1867-04-06_004.pdf
├── 1867-04-20_001.pdf
├── 1867-04-20_002.pdf
├── 1867-04-20_003.pdf
├── 1867-04-20_004.pdf
├── 1867-05-04_001.pdf
├── 1867-05-04_002.pdf
├── 1867-05-04_003.pdf
├── 1867-05-04_004.pdf
├── 1867-05-11_001.pdf
├── 1867-05-11_002.pdf
├── 1867-05-11_003.pdf
├── 1867-05-11_004.pdf
├── 1867-05-18_001.pdf
├── 1867-05-18_002.pdf
├── 1867-05-18_003.pdf
├── 1867-05-18_004.pdf
├── 1867-05-25_001.pdf
├── 1867-05-25_002.pdf
├── 1867-05-25_003.pdf
├── 1867-05-25_004.pdf
```

![User interface after performing the import](screen4_en.png)

During the import process, a process is created in Goobi for each year, in which a structural element is created for each newspaper issue with the associated data, which is generated from the file names and the values of the configuration.

![Generated newspaper editions with the associated metadata](screen5_en.png)


## Configuration
The plugin is configured in the file `plugin_intranda_workflow_newspaper_pages_importer.xml` as shown here:

{{CONFIG_CONTENT}}

Parameter | Explanation
-------------------------|----------------------------------------
`importFolder` | This parameter is used to specify the directory from which the data is to be imported.
`workflow` | This parameter defines the name of the Goobi process template on the basis of which the processes are to be generated.
`processtitle` | Specify here what the title of the processes to be created should be. The year is added when the processes are created (e.g. `New_York_Times_123456789`).
`pageNumberPrefix` | If the pages in the Arabic pagination are to be preceded by a prefix, this can be defined here (e.g. "Page").
`languageForDateFormat` | Specify the language to be used for generating the output titles (e.g. `en` or `de`).
`issueTitlePrefix` | If a prefix is to be placed before the detailed date as the title of the newspaper issue, this can be entered here (e.g. "Issue from").
`deleteFromSource` | If the files to be imported are to be deleted from the import directory after the import, this can be specified here.
`metadata` | These elements can be used to specify which metadata should be used at newspaper and volume level for the processes to be created. An independent metadata is created from each element specified here. It accepts six attributes, whereby `value` and `type` are mandatory, while `var`, `anchor`, `volume` and `person` are optional. Further details can be found in the comments within the sample configuration.
Binary file added docs/screen1_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen1_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen2_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen2_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen3_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen3_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen4_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen4_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen5_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/screen5_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,18 @@
<!-- which workflow to use -->
<workflow>Newspaper_workflow</workflow>

<!-- prefix for the process title; will be extended by '_' and the year information -->
<processtitle>mytitle_1234567</processtitle>

<!-- prefix for the page labels -->
<pageNumberPrefix>Seite </pageNumberPrefix>

<!-- language to use for long date formatter in issue title -->
<languageForDateFormat>de</languageForDateFormat>

<!-- prefix to use for the issue titles -->
<issueTitlePrefix>Ausgabe vom</issueTitlePrefix>

<!-- Whether or not to delete the images from the import folder once they are imported. OPTIONAL. DEFAULT false. -->
<deleteFromSource>true</deleteFromSource>

Expand Down
8 changes: 4 additions & 4 deletions module-base/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>io.goobi.workflow.plugin</groupId>
<artifactId>plugin-workflow-liechtenstein-volksblatt-importer</artifactId>
<version>24.06</version>
<artifactId>plugin-workflow-newspaper-pages-importer</artifactId>
<version>24.07</version>
</parent>
<artifactId>plugin-workflow-liechtenstein-volksblatt-importer-base</artifactId>
<artifactId>plugin-workflow-newspaper-pages-importer-base</artifactId>
<packaging>jar</packaging>
</project>
</project>
Loading

0 comments on commit 2090d20

Please sign in to comment.