Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for archive formats other than ZIPs + archives within archives #2

Open
aborel opened this issue Aug 3, 2023 · 4 comments
Open
Assignees

Comments

@aborel
Copy link
Contributor

aborel commented Aug 3, 2023

According to received information

  • Libsafe Archive Extractor instances can act on one type in: .zip .7z .rar .tar .tar.gz .gz .tgz
  • sub-archives can be handled through successive instances, operating on one file type as well.

A full general solution or an agreement on more specific requirements would be useful.

@aborel aborel self-assigned this Aug 3, 2023
@aborel
Copy link
Contributor Author

aborel commented Aug 3, 2023

General solution to support any N formats with up to M nesting levels: M * N! instances would be run.
.tar.gz handled by a sequence of .gz + .tar => 720 * M successive instances. 120 if we ignore RAR, 24 if .tgz can be handled transparently with gz?

@aborel
Copy link
Contributor Author

aborel commented Aug 4, 2023

RAR https://rarfile.readthedocs.io/ => requires external unrar executable
tar, tar.gz, tgz: https://docs.python.org/3/library/tarfile.html

@aborel
Copy link
Contributor Author

aborel commented Aug 4, 2023

The app must be aligned with the preservation plans. Is it possible to define the preprocessors of a PP programmatically? Not seen in the API (v2 or v3), ex.

curl -X GET "https://acoua.epfl.ch/api/v2/preservationplans/19" -H  "accept: application/json" -H  "Authorization: Bearer XXXXXXXXXXXXXXXXX"

{
  "id": 19,
  "name": "PP_Zenodo_Manual_Import_005",
  "creation": "2023-07-05 09:10:12",
  "associatedArea": 4,
  "metadataSchema": 16,
  "metadataFilter": 11,
  "metadataParams": {
    "filterParam1": "C:\\libsafe\\conf\\Filters\\filter1001.datacite.configfile_as_param1.xslt",
    "filterParam2": "tentativeParent",
    "filterParam3": null,
    "filterParam4": null,
    "filterParam5": null
  },
  "metadataPattern": "Metadata.xml",
  "storageGroups": [
    {
      "id": 1,
      "writePriority": 0,
      "readPriority": 0
    },
    {
      "id": 2,
      "writePriority": 1,
      "readPriority": 1
    }
  ],
  "dipProfiles": [],
  "algorithms": [
    {
      "shortName": "md5",
      "isDefault": 1
    }
  ]
}

@aborel
Copy link
Contributor Author

aborel commented Aug 4, 2023

7z can indeed be extracted by Libsafe https://test-acoua.epfl.ch/catalog/object_details/577.0
Does the system allow several instances? It doesn't seem to work (Archive Extractor is greyed out in the 2nd preprocessor selector).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant