Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for paginated collections #281

Closed
1 of 2 tasks
m-mohr opened this issue Feb 23, 2023 · 10 comments
Closed
1 of 2 tasks

Better support for paginated collections #281

m-mohr opened this issue Feb 23, 2023 · 10 comments
Milestone

Comments

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 23, 2023

Paginated collections are not supported very well. The primary reason is:

Let's say an API has 1000 collections and chooses a pagesize of 10, then I'd have to do 100 HTTP requests sequentially to request the full list of collections. To avoid this I've taken the simple route for now to not show the collections but just let users enter the collections freely.

Things that are not working properly:

General guideline for STAC Browser right now is to enable pagination for collections only if you have 1000+ collections.

@m-mohr m-mohr added this to the 3.2.0 milestone Feb 23, 2023
@ycespb
Copy link

ycespb commented Mar 24, 2023

Related to his issue: Our catalog has a number of subcatalogs organizing the collections per platform, organisation etc. As we have many collections at /collections, the STACBROWSER GUI continues loading additional collections and shows some of the subcatalogs only at the bottom. It becomes impossible to select them as additional collections become inserted in front all the time when scrolling downward... See for instance the "organisations" catalogue at the bottom of https://radiantearth.github.io/stac-browser/#/external/emc.spacebel.be/?.language=en. Maybe displaying the "child" catalogues at the top before any other collections retrieved from /collections might help to keep them accessible ?

@m-mohr
Copy link
Collaborator Author

m-mohr commented Mar 28, 2023

I guess the catalogs should be shown before the collections.

On the other hand: Why do you have this separation anyway? Or do you want to hide either the catalogs or collections by default? Then you should set the addMissingChildren option (see README).

@m-mohr m-mohr modified the milestones: 3.2.0, future, 3.1.0 May 12, 2023
m-mohr added a commit that referenced this issue Aug 16, 2023
@m-mohr m-mohr modified the milestones: 3.1.0, 3.2.0 Aug 16, 2023
@cboettig
Copy link

@m-mohr Not entirely sure if this is related, but was just looking at the NASA EarthData STAC catalogs, where their collection.json files always include only 10 items per page, e.g.: https://radiantearth.github.io/stac-browser/#/external/cmr.earthdata.nasa.gov/stac/POCLOUD?page=3&.language=en

note that they include

{"rel":"prev","href":"https://cmr.earthdata.nasa.gov/stac/POCLOUD?page=2"},{"rel":"next","href":"https://cmr.earthdata.nasa.gov/stac/POCLOUD?page=4"}]

in the links section, though I'm not sure that's part of the STAC standard convention (i.e. I think it's valid stac but not sure that "prev" and "next" are privileged key terms?). Or maybe this needs to be a "collection" instead of a "catalog"? In any event, this results in stac-browser only ever showing the first 10 items in each collection. OTOH, API-based queries are fine, since NASA has a nice responsive STAC API (maybe the STAC JSONs need to be including links to the API)?

cc'ing are NASA friends @asteiker & @abbottry who may have more insight on this from the NASA CMR side.

And a huge thanks to you all, I love using the ecosystem of tools you have all brought together in this open source space!

@abbottry
Copy link

@cboettig in more recent changes (deployed to testing environments only at the moment) we have removed the idea of prev for a couple of reason, and are now only supporting next.

As for page size, CMR allows up to 2000 records per page, its not super desirable from a performance perspective to return page sizes that large (server side and client side), but if what @m-mohr is suggesting is true, I could see a world where we return 1000 and only page if there are more than that.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Nov 10, 2023

No, it's not what this issue is about. Pagination is generally supported if implemented in a STAC API compliant way, which I assume is not the case here, but i have not verified what CMR is doing in detail.

@cboettig
Copy link

Thanks @m-mohr ! also apologies if I should have opened a new thread for this? I'm not clear on what the right approach is for STAC catalogs that may have 1000s of entries.

Also, I notice the planetarycomputer examples stac browser shows the connection to the Stac API. Is stac-browser able to query the API directly instead of working from a very large JSON? I wasn't quite clear on what elements must be included for stac browser to detect the API, I see PC uses:

{
    "rel": "conformance",
      "type": "application/json",
      "title": "STAC/WFS3 conformance classes implemented by this server",
      "href": "https://planetarycomputer.microsoft.com/api/stac/v1/conformance"
    },
    {
      "rel": "search",
      "type": "application/geo+json",
      "title": "STAC search",
      "href": "https://planetarycomputer.microsoft.com/api/stac/v1/search",
      "method": "GET"
    },
    {
      "rel": "search",
      "type": "application/geo+json",
      "title": "STAC search",
      "href": "https://planetarycomputer.microsoft.com/api/stac/v1/search",
      "method": "POST"
    },

(NASA STAC CMR does seem to have a working STAC API as well as these static paged json files)

@m-mohr
Copy link
Collaborator Author

m-mohr commented Nov 10, 2023

For catalogs there's no pagination in STAC, thus there won't be support for it in STAC Browser. You basically just create new sub-catalogs if your catalog grows too large. The API is detected based on conformance classes, indeed. I think CMR is not STAC compliant so I guess the better place is to open an issue with them.

@cboettig
Copy link

Thanks @m-mohr , really appreciate your help here.

For catalogs there's no pagination in STAC, thus there won't be support for it in STAC Browser.

These are both good things to know, and sorry I couldn't find it in the documentation. Do I understand that switching to a collection would help here?

I think CMR is not STAC compliant so I guess the better place is to open an issue with them.

Thanks, I have also reached out to them separately, but am trying to provide more precise feedback to them about what they need to improve. The pages pass appear to pass the validator, e.g.

$ stac-validator https://cmr.earthdata.nasa.gov/stac/POCLOUD

Thanks for using STAC version 1.0.0!

[
    {
        "version": "1.0.0",
        "path": "https://cmr.earthdata.nasa.gov/stac/POCLOUD",
        "schema": [
            "https://schemas.stacspec.org/v1.0.0/catalog-spec/json-schema/catalog.json"
        ],
        "valid_stac": true,
        "asset_type": "CATALOG",
        "validation_method": "default"
    }

I'd be happy to communicate to the CMR team what was not compliant if I understood it properly. As I understand it, the validator is happy to validate a wide range of rel types in the links section, but I haven't been able to find which relation types (beyond those listed collection-spec.md of course) have specific meaning in stac browser (like conformance) and which might be non-standard or only be specific to some particular provider's implementation.

I recognize STAC browser is a volunteer effort, really appreciate all you do!

@m-mohr
Copy link
Collaborator Author

m-mohr commented Nov 10, 2023

What I meant to say that there is no pagination support in the static stac spec. It's only defined in parts of the STAC API, more specifically in Item Search and STAC API - Collections / Features. You can't just use a next link in the landing page.

I don't have the time to check in detail what's wrong with CMR, sorry. One thing I saw is that they use next links in a static context though and that the link to collections is using rel type collections instead of data. But this issue is also not the right place to discuss this. Either bring this up with CMR or in one of the normal STAC support channels (e.g. stac-spec discussions or STAC community calls).

The validators give a false sense of security. No STAC Validator is able to do a full validation, especially it doesn't validate anything that is undefined in the spec (such as next links outside of the API scope).

@radiantearth radiantearth locked as off-topic and limited conversation to collaborators Nov 10, 2023
@m-mohr
Copy link
Collaborator Author

m-mohr commented Nov 10, 2023

Closing in favor of #390 for what is left to be done for this issue.

@m-mohr m-mohr closed this as completed Nov 10, 2023
@radiantearth radiantearth unlocked this conversation Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants