-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API endpoints for search #262
Conversation
e40c6b2
to
0cbae04
Compare
source_id = models.TextField(max_length=128) | ||
source_package = models.TextField(blank=True, | ||
help_text=_l("Unique identifier of originating unit")) | ||
size = models.IntegerField(default=0, help_text='Size in bytes of the file') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i18n!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrwdunham, I understand why we wanted to start using django-rest-framework but do you know if the work in this PR had a particular requirement on the new framework? Some context here would be good. If we're going to adopt a new framework and they are going to coexist for a while we should probably explain the motivations.
Is django-rest-framework different enough that should be conceived in a new /api/v3
namespace maybe? tastypie exposes details about the resources to allow discovery (e.g. http://django-tastypie.readthedocs.io/en/latest/interacting.html#api-wide or http://django-tastypie.readthedocs.io/en/latest/interacting.html#inspecting-the-resource-s-schema) - is this going to work well when sharing the same namespaces with another framework? Also other things like pagination, etc... is it close enough? If they're not I think we're going to make very difficult to implement clients and that's not good.
@sevein, I don't know the exact motivations for using DRF for this work. At a high level, I suspect it is because DRF won out over TastyPie historically. One of the goals of this work (in this later stage) was to "Evaluate if this allows us to remove tastypie in favour of Django REST framework (yes/no/maybe?/both)". See https://wiki.archivematica.org/Improvements/Reporting. After that question it currently states "done, answer is both". Ultimately this work needs to allow the client to be able to answer the following (types of) questions using the SS API:
I think this PR does that. Given the small scope of the project behind this PR, I think we will have to deliver to the client a dev branch that they can test. That means leaving this PR unmerged until we can find the time to replace TastyPie with DRF or at least write tests to ensure that these changes provide a consistent API and do not break the current TastyPie API. Thoughts? @jhs @mcantelon ? |
* enabled (whether the location is enabled) | ||
|
||
For example, if you wanted to get details about the transfer source location | ||
contained in the space 6d0b6cce-4372-4ef8-bf48-ce642761fd41 you could HTTP get:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that if you pass a nonsense param to one of these search endpoints, you get all of the resources. Is that expected/desired? For example http://192.168.168.192:8000/api/v2/search/location/?moonbeam=figaro returns all locations. As does http://192.168.168.192:8000/api/v2/search/location/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have copied comments from the original #89 PR to the relevant places in this one. I have cited the reviewers if they are not me.
* max_size (maximum filesize) | ||
* normalized (boolean: whether or not file was normalized) | ||
* valid (nullable boolean: whether or not file was validated and, if so, its | ||
validity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the original validated
field to valid
, a nullable boolean field where None
indicates not validated, True
valid, and False
invalid. To see how this used to work, see https://github.com/artefactual/archivematica-storage-service/pull/89/files for.
filter_backends = (filters.DjangoFilterBackend,) | ||
filter_class = FileFilter | ||
|
||
@list_route(methods=['get']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this used? TODO: find out.
""" | ||
aip_dir_name = os.path.basename(os.path.splitext(self.full_path)[0]) | ||
relative_path = os.path.join(aip_dir_name, "data", "METS." + self.uuid + ".xml") | ||
path_to_mets, temp_dir = self.extract_file(relative_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from @Hwesta: Has this been tested with compressed & uncompressed AIPs? Does it work with AIPs not stored in the local filesystem? Since this is happening asynchronously, what happens when the AIP storage operation completes during this thread? eg self.full_path may not be accurate, or may change partway through
For the original requirements that motivated this search API work, see https://wiki.archivematica.org/Research_data_management#METS_parsing. |
@jrwdunham can you create a sample CURL command and JSON response that we can use for testing? Suggested questions:
|
@sromkey Here are some example searches. Something like this should probably be added to search.rst. (These assume you have curl and jq installed).
|
23d1e07
to
a1bccbf
Compare
a1bccbf
to
903d568
Compare
I rebased this against qa/0.x and added a commit that makes AIP file indexing work with uncompressed AIPs. |
It looks like the UUID for file queries returns one minted newly by the Storage Service when it creates the file in the database. It would be more meaningful to return the source_id, e.g. the UUID created by Archivematica and recorded in the METS file. |
- Locations, packages and files can all now be searched via GET requests to: - http://<storage service URL>/api/v2/search/location/ - http://<storage service URL>/api/v2/search/package/ - http://<storage service URL>/api/v2/search/file/ - See documentation at docs/search.rst. - Includes migration to File model. - Uses Django REST Framework (new requirement).
Prior to this the `index_file_data_from_aip_mets` method of `Package` was assuming that the package was compressed. Now it correctly parses the METS file of uncompressed AIPs as well.
903d568
to
50d6358
Compare
26 May 2020: Review of current status against stable/1.11.x
metsrw
PREMIS capabilities swapped in..py
files updated in this PR, the conflicting files listed below by Github are:9 November 2017: Original PR
For the requirements that motivated this work, see:
Note: much of this work is due to PR #89 of @mcantelon. Since that PR is quite old and required extensive rebasing, its commits were squashed and made into this PR, and its CR comments, where applicable, were transferred to this PR.
Note 2: when metsrw v. 0.2.1 is released, requirements/base.txt will need to be updated to reference it. See artefactual-labs/mets-reader-writer#34.
Note 3: the asynchronous processing of the original PR 89 has been removed. See
archivematica-storage-service/storage_service/locations/api/resources.py
Lines 575 to 576 in e4bd212
Fixes #261
Connected to #261