Skip to content

Latest commit

 

History

History
57 lines (30 loc) · 3.92 KB

extend.md

File metadata and controls

57 lines (30 loc) · 3.92 KB

How to extend Kernalytics

This chapter contains links to the various locations that must be modified to add a new data type or a new kernel.

How to add a new data type

Adding a new data type is a bit involved, as it requires added a data container, but also a parser and a few other operations.

Pattern-matching of the kernel name

Add its name as a case in the pattern matching of parseIndividualVar. Then call the proper parser to transform the raw data in Array[String] to a ParsedVar which is a named DataRoot.

Parser

Add the implementation of the parsing in a specific function in io.ReadVar, or in an external file, like ParseVectorReal.

The parser must transform an Array[String] to a DataRoot.

DataRoot subtyping

Since the data type is new, it might be necessary to store it in a new type of container. The current containers used are Breeze container (such as DenseMatrix[Real]s), but anything could be used. Each variable is contained in an instance of a subclass of DataRoot. The modification is to subtype DataRoot.

Algebraic system

If the data type has inner product / norm / distance, implement them in Algebra, as this will allow families of kernels to be quickly generated for this kernel.

How to add a new kernel on an existing data type

A kernel simply is a two arguments function from a couple of the X data type to Real. Its integration in Kernalytics in not very hard.

There are two ways to implement this function:

  1. Directly as a (X, X) => Real function directly in Kernel. See the dummyLinearKernel for example.
  2. Indirectly as an algebraic object, to be used as an argument for another function in Kernel , like InnerProduct.linear or Metric.gaussian. The algebraic system is discussed in more details in the overview.

You must then add the corresponding kernel and its data type as a case in generateKernelFromParamData

How to add a new numerical method

First, the new numerical method must be detected in the input files, so the right code is called. This is done in callAlgo for Learn and callAlgo for Predict.

Then, the parameter parsing and launch code must be written in the learn and predict directories.

Finally, the code algorithm must be written in the algo directory. This part must use the data and parameters that have been parsed in the preceding steps. It is the code of the numerical method.

Notes

Architecture

The current way to handle mixing data types and kernels is to use local pattern-matching in KernelGenerator.

The current implementation in KernelGenerator is not satisfying, as it relies on pattern-matching against a set of predefined combination of data types and kernel names. This is not optimal, as adding a new type or kernel implies modifying code scattered all over Kernalytics. Ideally, everything should be centralized so that all the logic could be contained in a single object for each type.

Note that there is a typeName: String here, which is similar to the string in parseIndividualVar. This could be leveraged when reworking the data types management.

Unit testing

Kernalytics implements unit testing via the ScalaTest library. Developers are strongly advise to add / run unit tests as often as possible. They are located in the test directory.