Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add metadataTypes endpoint - readonly; available for everyone #1428

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Ingvord
Copy link
Contributor

@Ingvord Ingvord commented Sep 19, 2024

Add required endpoint for #613

@sbliven
Copy link
Contributor

sbliven commented Sep 30, 2024

What is a metadata type? Can you give an example? I feel like this change should be discussed.

@Ingvord
Copy link
Contributor Author

Ingvord commented Sep 30, 2024

@sbliven Thanks for asking!

Here is an example:

Screenshot_20240930_235418

This is required for SciCatProject/frontend#1594

BTW you can try it yourself. As easy as switching a git branch and rebuilding the project. Once rebuild, open your browser and navigate to localhost:3000/explorer you will see all the endpoints provided by this service, including the new /datasets/metadataTypes.

I recorded a short tutorial:

Peek.2024-10-01.00-08.mp4

Of course that implies you have a dev or testing environment i.e. db etc

If you need any guides on how to do this in your env feel free to reach me out - I will be more than happy to help.

@Ingvord Ingvord force-pushed the add-metadata-types-endpoint branch from 734ac73 to cd6c358 Compare October 2, 2024 11:00
Copy link
Contributor

@sbliven sbliven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, every call to this endpoint iterates over every dataset, groups all top-level scientificMetadata fields by key, and then returns the javascript type of the values (or 'mixed' if multiple types are present).

I have multiple objections to this.

  1. It doesn't work for hierarchical metadata
  2. The performance is going to be terrible on big instances. Facilities that don't enforce a single scientificMetadata schema will likely return tens of thousands of keys, most of which will be the useless 'mixed'.
  3. I think this could better be achieved using schemas for facilities that do have standardized metadata.
  4. It doesn't support the unitSI/valueSI convention already in use.

@sbliven
Copy link
Contributor

sbliven commented Oct 2, 2024

I do see that the frontend needs this info for search auto completion. Would one of these work?

  1. Proscriptive. Operators provide this table manually, eg as a config file. This avoids rebuilding it for every request. They could even implement a script which updates the configuration from the database occasionally if that is required and the number of keys stays reasonable.
  2. User input. Don't auto-complete keys or types, but allow users to specify them manually.
  3. Index. Maintain an index specifically for this query. Dataset creation presumably becomes a bit slower, but this query would be efficient. Care would be needed to keep the index from growing too large.

…I-Logic-from-Frontend"

This reverts commit 09d2c75, reversing
changes made to 38780fd.
@nitrosx
Copy link
Contributor

nitrosx commented Oct 4, 2024

I have not review the PR, but I agree with @sbliven that this feature needs to be discussed with the collaborators.
If the list is compiled when the request is submitted, performance will be poor for sure. Also we need to consider all the different metadata structures that are used across facilities and be clear which one we address.

alternative solution might be to build this list offline at set intervals. This solution will increase performances, but we would need to accept that the list might not need up-to-date

@nitrosx
Copy link
Contributor

nitrosx commented Oct 4, 2024

@Ingvord can you be a little bit more specific why you need this endpoint?
More important do you see any other solution to the issue that you are aiming to solve?

@Ingvord
Copy link
Contributor Author

Ingvord commented Oct 4, 2024

@sbliven Thanks for your comprehensive review!

In general I agree with your objections. Though a few quantative metrics are required here.

Your suggested solutions seem prominent. I would strongly support the Prospective one. But I agree with @nitrosx this requires an approval from the collaborators.

I suggest we discuss this in more details via a zoom meeting (maybe on a smaller scale at first, say just three of us). And then propose whatever consensus we have at the regular collaborators meeting.

Thanks again for your input!

@nitrosx
Copy link
Contributor

nitrosx commented Oct 8, 2024

After talking to @Junjiequan, we should leverage the datasets/metadataKeys endpoint. It already provide the list of metadata keys. It needs some refactoring but it can extend to also cover the needs that this PR is trying to address

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants