-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/issue 8895 search api #89
Conversation
storage_service/metsrw/__init__.py
Outdated
@@ -0,0 +1,13 @@ | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not specify this as a pip dependency instead of vendoring it? Pip can install from git, and we can start working on getting it into PyPI. See https://github.com/artefactual/automation-tools/blob/dev/dataverse/requirements/base.txt#L4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks... that works slick.
One thing I'm wondering about - I'm not sure what the requirements were here. Is there any requirement to support clauses other than |
Ping - any thoughts re: my last comment? |
9dc7738
to
548ab10
Compare
@mistydemeo Sorry, didn't see this... The requirements are pretty rudimentary... here are the searches that were anticipated: https://wiki.archivematica.org/Research_data_management#METS_parsing |
6676e60
to
3c5eda6
Compare
a3ccc17
to
4eb3a27
Compare
4eb3a27
to
d59e806
Compare
d59e806
to
6a07894
Compare
|
||
* uuid (file UUID) | ||
* package (package UUID) | ||
* name (enter or partial filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo? Should "enter" be "entire"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think there's a typo in the line below: "PRONUM"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed these typos in #262
* min_size (minimum filesize) | ||
* max_size (maximum filesize) | ||
* normalized (boolean: whether or not file was normalized) | ||
* valid (boolean: whether or not file data is valid or malformed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording sounds contradictory: "whether or not file data is valid or malformed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think valid
should be validated
here. No? If so, I think the validated attribute of File only records whether the file has been validated, not whether it is valid. (Unless the help text is wrong in models/event.py::File
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should source_package
be listed as a search filter here? FileFilter
seems to expose it. It's a bit confusing since it only references Transfers I believe. See the pre-existing Package.index_file_data_from_transfer_mets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed validated
to valid
a nullable boolean field where None
indicates not validated, True
valid, and False
invalid. Migrations and help text updated accordingly in #262.
docs/search.rst
Outdated
|
||
http://<storage service URL>/api/v2/search/file/?min_size=29965171 | ||
|
||
Here is an example JSON response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text above the URL doesn't appear to match what that URL actually searches for.
9ae818f
to
4f981e4
Compare
Call to construct non-existent class `METSWriter` should use `METSDocument`.
ed899d4
to
970fd10
Compare
8708a3a
to
e4bd212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments have led to changes in the PR that supersedes this one: see #262
|
||
class Meta: | ||
model = models.File | ||
fields = ['uuid', 'name', 'file_type', 'min_size', 'max_size', 'format_name', 'pronom_id', 'pipeline', 'source_package', 'normalized', 'validated', 'ingestion_time'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should source_package
in this list be package
?
* min_size (minimum filesize) | ||
* max_size (maximum filesize) | ||
* normalized (boolean: whether or not file was normalized) | ||
* valid (boolean: whether or not file data is valid or malformed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think valid
should be validated
here. No? If so, I think the validated attribute of File only records whether the file has been validated, not whether it is valid. (Unless the help text is wrong in models/event.py::File
filter_backends = (filters.DjangoFilterBackend,) | ||
filter_class = FileFilter | ||
|
||
@list_route(methods=['get']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this used? TODO: find out.
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('locations', '0015_gpg_encrypted_space'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs another rebase: 0016_mirror_location_aip_replication.py migration has since been introduced.
metadata['filename'] = premis_object.original_name | ||
|
||
if len(premis_object.object_identifiers[0]): | ||
if premis_object.object_identifiers[0]['type'] == 'UUID': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that files in a METS file can have other types of ids (e.g., handle, PURL), we need a more robust way of getting the UUID. I think this needs to loop through all identifiers and find the first UUID one, if it exists.
|
||
# Indicate whether or not a file has been validated in metadata and if it passed | ||
if premis_event.event_type == 'validation': | ||
metadata['validated'] = premis_event.outcomes[0]['outcome'] == "pass" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems that the validated
attribute of a file does actually mean "is valid". I think maybe the name should be change to is_valid
. But if that's the case, then I think we want 3 possible values: True
, False
, and None
...
For example, if you wanted to get details about the transfer source location | ||
contained in the space 6d0b6cce-4372-4ef8-bf48-ce642761fd41 you could HTTP get:: | ||
|
||
http://<storage service URL>/api/v2/search/location/?space=7ec3d5d9-23ec-4fd5-b9fb-df82da8de630&purpose=TS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that if you pass a nonsense param to one of these search endpoints, you get all of the resources. Is that expected/desired? For example http://192.168.168.192:8000/api/v2/search/location/?moonbeam=figaro
returns all locations. As does http://192.168.168.192:8000/api/v2/search/location/
* min_size (minimum filesize) | ||
* max_size (maximum filesize) | ||
* normalized (boolean: whether or not file was normalized) | ||
* valid (boolean: whether or not file data is valid or malformed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should source_package
be listed as a search filter here? FileFilter
seems to expose it. It's a bit confusing since it only references Transfers I believe. See the pre-existing Package.index_file_data_from_transfer_mets
|
||
* uuid (file UUID) | ||
* package (package UUID) | ||
* name (enter or partial filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed these typos in #262
* min_size (minimum filesize) | ||
* max_size (maximum filesize) | ||
* normalized (boolean: whether or not file was normalized) | ||
* valid (boolean: whether or not file data is valid or malformed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed validated
to valid
a nullable boolean field where None
indicates not validated, True
valid, and False
invalid. Migrations and help text updated accordingly in #262.
I am closing this in favour of #262 |
Added new fields to File model and add logic to index files in stored AIPs using metsrw to parse file metadata from METS files. It's related to issue #8895.