This chapter contains links to the various locations that must be modified to add a new data type or a new kernel.
Adding a new data type is a bit involved, as it requires added a data container, but also a parser and a few other operations.
Add its name as a case in the pattern matching of parseIndividualVar. Then call the proper parser to transform the raw data in Array[String]
to a ParsedVar
which is a named DataRoot.
Add the implementation of the parsing in a specific function in io.ReadVar, or in an external file, like ParseVectorReal.
The parser must transform an Array[String]
to a DataRoot.
Since the data type is new, it might be necessary to store it in a new type of container. The current containers used are Breeze container (such as DenseMatrix[Real]
s), but anything could be used. Each variable is contained in an instance of a subclass of DataRoot. The modification is to subtype DataRoot.
If the data type has inner product / norm / distance, implement them in Algebra, as this will allow families of kernels to be quickly generated for this kernel.
A kernel simply is a two arguments function from a couple of the X data type to Real. Its integration in Kernalytics in not very hard.
There are two ways to implement this function:
- Directly as a (X, X) => Real function directly in Kernel. See the
dummyLinearKernel
for example. - Indirectly as an algebraic object, to be used as an argument for another function in Kernel , like
InnerProduct.linear
orMetric.gaussian
. The algebraic system is discussed in more details in the overview.
You must then add the corresponding kernel and its data type as a case in generateKernelFromParamData
First, the new numerical method must be detected in the input files, so the right code is called. This is done in callAlgo for Learn and callAlgo for Predict.
Then, the parameter parsing and launch code must be written in the learn and predict directories.
Finally, the code algorithm must be written in the algo directory. This part must use the data and parameters that have been parsed in the preceding steps. It is the code of the numerical method.
The current way to handle mixing data types and kernels is to use local pattern-matching in KernelGenerator.
The current implementation in KernelGenerator is not satisfying, as it relies on pattern-matching against a set of predefined combination of data types and kernel names. This is not optimal, as adding a new type or kernel implies modifying code scattered all over Kernalytics. Ideally, everything should be centralized so that all the logic could be contained in a single object for each type.
Note that there is a typeName: String
here, which is similar to the string in parseIndividualVar. This could be leveraged when reworking the data types management.
Kernalytics implements unit testing via the ScalaTest library. Developers are strongly advise to add / run unit tests as often as possible. They are located in the test directory.