Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise identifier code to handle the cat-interop list #19

Closed
roomthily opened this issue Feb 15, 2015 · 2 comments
Closed

Revise identifier code to handle the cat-interop list #19

roomthily opened this issue Feb 15, 2015 · 2 comments
Assignees
Labels

Comments

@roomthily
Copy link
Contributor

See #13 (expanding that list).

This relates to the current code structure - it is a couple of methods returning the protocol (good match to the cat-interop.identifier) and a boolean for whether it's a service description service (see also #14). But the cat-interop has a kind of hierarchical structure in those identifiers that doesn't match that code structure well:

OpenSearch1.1:Description

which we would not get correctly (we would get OpenSearch1.1 and a True).

Current proposal: make it a class that returns a complete description of the document. It's of Type OpenSearch, Version 1.1, Is Description Document, Is Not Dataset, etc.

@roomthily roomthily added the bug label Feb 15, 2015
@roomthily roomthily self-assigned this Feb 15, 2015
@roomthily
Copy link
Contributor Author

See 1731ef9. But for reference:

# TODO: put together a configuration widget for
#       to map protocol to some search filters
#       and some service description filters,
#       and some dataset filters so that we can
#       have one thing to map the priority set
#       vs the IDENTIFY ALL THE THINGS! set. oh,
#       and wind up with reasonable line lengths
#       for beto. :) so basically elasticsearch all
#       the things.
#
# _ors: [content filters] + [url filters] (ANY match)
# _ands [content filter + url filter (or other combo)]
# where an _ands can be a filter in an _ors

and will also need to deal with versions (ie identify through xpath and an xpath, identify through url and some regex?, identify through some namespace and more regex?).

This might be a little overly complex. But configurable is good and code reuse is good.

@roomthily
Copy link
Contributor Author

This part would also be a good place for the error checks? for those services that return a valid response (not just 404/500/etc that nutch handles (@betolink, how does nutch handle the http error codes?)) like the OGC services where the response is a blob of xml with some error but the status code - it returned a response!

Example:

http://geobrain.laits.gmu.edu/cgi-bin/wcs-all?service=wcs&version=1.0.0&request=getcapabilities

returns:

<?xml version="1.0" encoding="UTF-8"?>
<ExceptionReport xmlns="http://www.opengis.net/ows" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/ows/owsCommon.xsd" version="0.3.20" language="en">
    <Exception exceptionCode="NoApplicableCode" locator="getFileNameList()">
        <ExceptionText>Failed to Open Director /Volumes/RAIDL1/WCS-ALL-DATA/</ExceptionText>
    </Exception>
</ExceptionReport>

with a status of 200 OK. So more things to parse.

@roomthily roomthily mentioned this issue Feb 20, 2015
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant