Skip to content

Faceting

Aref Shafaei edited this page Oct 23, 2017 · 16 revisions

Proof of Concept

The follwing is a proof of concept of how faceting would integrate in recordset app. You can check it out in here.

Faceting

Faceting Scenario in Chaise

  1. Users open a url that might or might not have some filters in it. To show the results we do something like the following in chaise:
    const COUNT = 25;
    var reference, columns, rows;
    
    ERMrest.resolve(ermrestURI).then(function(result) {
        reference = result;
        
        columns = reference.columns;
        
        reference.read(COUNT).then(function(readResult){
            rows = readResult;
        });
        
    });
    
    // display Results
  1. Now we need to add some functionalities to get filters and show them:
    var filters = reference.location.filters;
    
    // display Filters
  • We should rewrite the filter parsing logic to handle more complex cases. We can represent the filters as a encoded JSON in the url. This will make it easier for chaise to create the filters as it doesn't need to know the exact syntax of ermrest for filters. Ermrestjs should take this new object and create its ermrest url.
  1. Now that we show the results and filters, we must show the facets. We must,

    3.1. Get list of visible facets.

        var visibleFacets = reference.facets;

    3.2. Go through the list of facts and append the required aggregate functions for a bulk request. And then show the results for this bulk request.

        var aggFns = [];
    
        // for showing number of resutls
        aggFns.push(reference.aggregate.count);
    
        // go through the visible facets to generate required single Aggregate functions. 
        for (var facet in visibleFacts) {
            switch(facet.column.type.name) {
                case "int":
                    aggFns.push(facet.column.aggregate.min, facet.column.aggregate.max);
                    break;
                // TODO same for different column types based on required facets
            }
        }
    
        // aggregate bulk request
        reference.getAggregates(aggFns).then(function (res) {
            // TODO map aggregates to each facet directive.
        });
    • There are 3 types of aggregates:
      • Table aggregate: Row count.
      • Column-level aggregate: min, max, not-null count
      • Group aggregate: histogram, term frequency
    • Based on the column type, we need different aggregated data,
      • For numbers: We need min, max, and histogram.
      • For vocabulary: We need total distinct count, and word frequency. We should present these values page by page.
    • we might need to break this bulk request into different requests to avoid getting into the url length limit.
  2. Now for each facets,

    4.1. We must also show the applied filters. Each directive of these facets should have access to a facet object, and therefore has access to the filter.

        // in each facet directives
        scope.filters = scope.facet.filter;
        
        // display current filters for the given facet.

    4.2. Users must be able to do domain filter. To do so, each facet directive must maintain a reference that can filter that. This reference must have the filters from other facets.

        scope.reference = vm.reference;

    Now we can call the column aggregate functions on this reference to get the results for search within a facet or group aggregates.

        var aggFn = scope.facet.column.aggregate.search(scope.text);
        
        scope.reference.getAggregates(aggFn).then( function(result) {
            scope.facetResults = result;
        });
        
        // show facet results.
    • Based on column type, we should provide different search-within methods,
      • For numbers: Users expect to zoom in or zoom out in histogram. So we should change the histogram based on current zoom (minimum and maximum).
      • For vocabulary: Users should be able to narrow down the word frequency result based on a string. So we need search functionality in here. The aggregate function should return a "paged" result. We should investigate how that is possible and how we can manage that.

    4.3. We must keep the filters that have selected and are active for this facet. We can keep them in scope.activeFilters.

  3. After applying each of the filters,

    5.1. Collect the new applied filters, apply those filters to the reference.

        var appliedFilters = getAllFilters(); // get all active filters from facets
        
        var newRef = ref.unfiltered.applyFilters(appliedFilters);
        
    • ermrestjs might not be able to apply all the filters because of the url length limit. Chaise should display an error in that case. ermrestjs can fail silently (truncate the url) and chaise should compare new reference filters with appliedFilters and show the proper warning.

ERMrestjs API

The following is the propossed API in ermrestjs:

function Reference () {
    // ..
    
    /**
     * @type {ERMrest.ColumnAggregateFn}
     **/
    this.aggregate = new ReferenceAggregateFn(this);
}


Reference.prototype = {

    /**
     * @type {ERMrest.Facet[]}
     **/
    get facets () {
        // TODO parse visible-facets annotation and return facets.
    },
    
    /**
     * @param {AggregateFn[]}
     **/
    getAggregates = function(aggregateList) {
        // TODO
    }

}

function ReferenceColumn () {
    // ..
    
    /**
     * @type {ERMrest.ColumnAggregateFn}
     **/
    this.aggregate = new ColumnAggregateFn(this);
}

function ColumnAggregateFn(column) {
    this.column = column;
}

ColumnAggregateFn.prototype = {    
    get min() {},
    get max() {},
    get CountNotNull() {},
    get CountDistinct() {},    
    get wordFrequency() {},
    histogram: function (min, max, bucketCount) {},
    search: function (str) {}
};

ReferenceAggregateFn.prototpye = {
    get count() {}
}


/**
 * TODO we might be able to reuse ReferenceColumn
 * 
 * @param {ERMrest.ReferenceColumn} refColumn
 **/
function Facet(reference, refColumn) {
    this.reference = reference;
    this.refColumn = refColumn;
}

Facet.prototype = {

    /**
     * @type {ERMrest.ParsedFilter}
     **/
    get filter () {
        // TODO use the reference.location.filters to return filters of this facet
    }
    
};

Tasks

  • (ermrestjs/chaise) Change url parsing methods regarding filters.
  • (chaise) Display the provided filters in the Location object.
  • (ermrestjs) Add aggregate APIs to Reference and ReferenceColumn.
  • (ermrestjs) Support default visible-facet annotation and create Facet class, and Reference.facets api.
  • (ermrest) Draft a visible-facet annotation.
  • (ermrestjs) Support visible-facet annotation
  • (chaise) Get list of facets and show appropriate directives for each of them.
  • (chaise) Directives for integer and vocabulary facets.
  • (chaise or ermrestjs) A function that goes through the facets and based on column type create a bulk aggregate request.
  • (chaise) Ability to search within a facet based on its type.
  • (ermrestjs) Add applyFilters function to reference that given a list of filters returns a new filtered reference.
  • (chaise) Use the applyFilters function to apply all the filters and show the results.

Issues

  • Link naming (directional)
  • Link ordering (within core table)
  • Customization via modeling/view vs annotation
  • scalability
Clone this wiki locally