Revise identifier code to handle the cat-interop list #19

roomthily · 2015-02-15T17:09:53Z

See #13 (expanding that list).

This relates to the current code structure - it is a couple of methods returning the protocol (good match to the cat-interop.identifier) and a boolean for whether it's a service description service (see also #14). But the cat-interop has a kind of hierarchical structure in those identifiers that doesn't match that code structure well:

OpenSearch1.1:Description

which we would not get correctly (we would get OpenSearch1.1 and a True).

Current proposal: make it a class that returns a complete description of the document. It's of Type OpenSearch, Version 1.1, Is Description Document, Is Not Dataset, etc.

roomthily · 2015-02-16T06:11:33Z

See 1731ef9. But for reference:

# TODO: put together a configuration widget for
#       to map protocol to some search filters
#       and some service description filters,
#       and some dataset filters so that we can
#       have one thing to map the priority set
#       vs the IDENTIFY ALL THE THINGS! set. oh,
#       and wind up with reasonable line lengths
#       for beto. :) so basically elasticsearch all
#       the things.
#
# _ors: [content filters] + [url filters] (ANY match)
# _ands [content filter + url filter (or other combo)]
# where an _ands can be a filter in an _ors

and will also need to deal with versions (ie identify through xpath and an xpath, identify through url and some regex?, identify through some namespace and more regex?).

This might be a little overly complex. But configurable is good and code reuse is good.

roomthily · 2015-02-16T17:44:12Z

This part would also be a good place for the error checks? for those services that return a valid response (not just 404/500/etc that nutch handles (@betolink, how does nutch handle the http error codes?)) like the OGC services where the response is a blob of xml with some error but the status code - it returned a response!

Example:

http://geobrain.laits.gmu.edu/cgi-bin/wcs-all?service=wcs&version=1.0.0&request=getcapabilities

returns:

<?xml version="1.0" encoding="UTF-8"?>
<ExceptionReport xmlns="http://www.opengis.net/ows" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/ows/owsCommon.xsd" version="0.3.20" language="en">
    <Exception exceptionCode="NoApplicableCode" locator="getFileNameList()">
        <ExceptionText>Failed to Open Director /Volumes/RAIDL1/WCS-ALL-DATA/</ExceptionText>
    </Exception>
</ExceptionReport>

with a status of 200 OK. So more things to parse.

roomthily added the bug label Feb 15, 2015

roomthily self-assigned this Feb 15, 2015

roomthily mentioned this issue Feb 20, 2015

Finish identifier class #24

Closed

14 tasks

roomthily closed this as completed Feb 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise identifier code to handle the cat-interop list #19

Revise identifier code to handle the cat-interop list #19

roomthily commented Feb 15, 2015

roomthily commented Feb 16, 2015

roomthily commented Feb 16, 2015

Revise identifier code to handle the cat-interop list #19

Revise identifier code to handle the cat-interop list #19

Comments

roomthily commented Feb 15, 2015

roomthily commented Feb 16, 2015

roomthily commented Feb 16, 2015