-
Notifications
You must be signed in to change notification settings - Fork 18
TODO
#To Do... ###A short list of issues and improvement/feature suggestions in no particular order...
-
Currently searching by ontology type ('type:http://schema.org/Place') doesn't return sub-types. Could be done by using the ontology while indexing to additionally post all sub-types or alternately during query time with query expansion.
-
Simplify horizontal & vertical indexing code. The current use of RDFDocument objects to "present" the subjects for building horizontal and Vertical indexes is a vestige of the way indexes are build using MG4J. It is unnecessarily complex. The reading of subjects into the mapper is independent of MG4J or the index writing process that goes on in the reducer. Have distinct mappers for horizontal and vertical index generation would be simpler to maintain.
-
Add some kind of simple 'join' functionality. Currently to find say 'All articles by a given author' requires the user to enter two queries. First to find the author's URL/BNode then to query all subjects with that auther. author:<URL|BNode> There is some experimental code in for MG4J that may permit this. Although it's not in the current MG4J release. Alternately adding the logic to the web app may be an option, although quite complex.
-
Reduce the web app's start time. For big indexes the web app can take 10's of minutes to start.
-
Handling of 'range' types. Everything other than URLs and BNodes is currently tokenised as regular text. This makes some things difficult or impossible to query. Ideally a user would be able to query by say a date range, or greater than some numerical value.
-
Handling of dates. Dates are tokenised as regular text. This makes MG4J parallel queries complex. eg. (predicate:http://schema.org/startdate ^ object:2011) & (predicate:http://schema.org/startdate ^ object:07) to find all subject with a startDate containing '2011' and '07'