Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish identifier class #24

Closed
14 tasks done
roomthily opened this issue Feb 20, 2015 · 5 comments
Closed
14 tasks done

Finish identifier class #24

roomthily opened this issue Feb 20, 2015 · 5 comments
Assignees

Comments

@roomthily
Copy link
Contributor

Merging some issues into one comprehensible thing.

Remaining tasks:

  • Generate a URN based on cat-interop/esip discussions (see Revise identifier code to handle the cat-interop list #19, Consider normalizing identification responses based on Cat-Interop work #13)
  • First, sort out a solid URN (or trace through the issues, mailing list posts, etc, for existing work)
  • Finish testing the prioritized list (check filters for accuracy! across protocols and services)
  • Finish adding version filters for prioritized list (if version available)
  • Add the "is dataset" check (see Update identifier for some "Is Dataset" option #14)
  • Add "is metadata" check (and by metadata, we mean an actual ISO or FGDC record only) No, we need it to handle the internal standard (oai-pmh:dc or csw:iso, etc).
  • Add additional error filters to the prioritized list (see also Add a set of error response examples to the response set #22)
  • Build out the full service identification config (for metrics)
  • Add some service type (metadata v dataset v service v ...) to the configs (re: URNs)
  • Make sure the URN can be generated by the Identity class
  • Verify that the version check can handle multiple options (may have some xpath option or a default value, depending, not oddly, on the version) IT DOES NOT! it should be good now. see f89d8c4
  • Add CSW to the priority list (this one can be awkward with the nested response - it's CSW with blocks of some other XML-based metadata like ISO, DC, etc, and so those namespaces are present).
  • Revise the opensearch filtering for opensearch but not ATOM?
  • Update version check for checking both default/checks option.

There is some known wonkiness in the yaml configuration (list v dict) so that should also be added to the list.

Regarding the CSW situation (and this applies also to oai-pmh at a minimum), we may need to switch from a binary check based on ordering of the services in the config to some scoring function - 67% likely to be Service A or something. I think we're good for now as long as we're careful about the filters and the secondary "is dataset?" or "is metadata?" can help mitigate this problem.

@roomthily
Copy link
Contributor Author

On Version Identification

In some cases, the service response will have an explicit version element (root.attrib['version'] for many OGC responses, for example). Sometimes we don't have that or we have an implied version (OpenSearch namespace URI or THREDDS catalog URI). So we have two definitions - some default value for the URI-type situations (if match, version='1.1') and a pull from the response.

  versions:
      defaults:
        ors:
          - type: simple
            object: content
            value: 'http://a9.com/-/spec/opensearch/1.1/'
            text: '1.1'

    versions: 
      checks:
        ors:
          - type: xpath
            # fully qualified xpath which is lovely and short here
            value: '@version'

Related commits:
e48dc05

f89d8c4

@roomthily
Copy link
Contributor Author

On URNs

Based on the OSGeo/geopython/ESIP Discovery work.

consider

  • urn:{protocol}:{service}:{version}
  • urn:{type}:{protocol}:{service}:{version}
  • urn:{type}:{protocol}:serviceType:{service}:{version}
  • {protocol}:{service}-{version}-{access: http|ftp|etc}-{method as dash-delimited string}

Of course, this all assumes that you have some service response that is well-contained, which we do not. WxS GetCapabilities describe enough to get at the data service so could be classified as both service and dataset (specific to this project, mind).

@roomthily
Copy link
Contributor Author

For the OpenSearch vs. ATOM with OS elements:

<feed xmlns='http://www.w3.org/2005/Atom' 
    xmlns:georss='http://www.georss.org/georss' 
    xmlns:opensearch='http://a9.com/-/spec/opensearch/1.1/'>

where we're just pinging for the namespace URI without understanding its relationship in the XML, ie it is not the default namespace and there are other namespaces present.

@roomthily
Copy link
Contributor Author

It is catching OpenSearch errors 👍

@roomthily roomthily self-assigned this Mar 5, 2015
@roomthily
Copy link
Contributor Author

High priority services are running reasonably. Everything else will be a discrete bug. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant