Skip to content

Latest commit

 

History

History
135 lines (88 loc) · 4.9 KB

README.md

File metadata and controls

135 lines (88 loc) · 4.9 KB

lvm4j

Project Status Build Status codecov Codacy Badge Maven Central Javadocs

Latent variable models in Java.

Introduction

Latent variable models (LVMs) are well-established statistical models where some of the variables are not observed. lvm4j implements popular LVMs in the Java programming language. For the sake of simplicity I refer to every model as latent if it consists of two disjoint sets of variables, one that is observed and one that is hidden (e.g. we don't have data or they are just not observable at all).

With new versions I will try to cover more latent variable models in lvm4j.

Implemented models

One of the most famous and magnificient of them all, the Hidden Markov Model, is applicable to a diverse number of fields (e.g. for secondary structure prediction or alignment of viral RNA to a reference genome).

Principal Component Analysis is a simple (probably the simplest) method for dimension reduction. Here you try to find a linear orthogonal transformation onto a new feature space where every basis vector has maximal variance. It's open to debate if this is a true latent variale model.

Installation

You can either install the package by hand if you do not want to use maven (why would you?) or just use the standard waytogo installation using a maven project (and pom.xml).

Install the package using Maven

If you use Maven just put this into your pom.xml:

<dependency>
    <groupId>net.digital-alexandria</groupId>
    <artifactId>lvm4j</artifactId>
    <version>0.1</version>
</dependency>

Install the package manually

You can also build the jar and then include it in your package.

  1. Clone the github repository:

    $ git clone https://github.com/dirmeier/lvm4j.git

  2. Then build the package:

    $ mvn clean package -P standalone

  3. This gives you a lvm4j-standalone.jar that can be added to your project (make sure to call this correctly).

Usage

Here, we briefly describe how the lvm4j libary is used. Also make sure to check out the javadocs.

So far the following latent variable models are implemented:

  • HMM (a discrete-state-discrete-observation latent variable model)
  • PCA (a dimension reduction method with latent loadings and observable scores)

How to use the HMM

Using an HMM (in v0.1) involves two steps: training of emission and transition probabilities and prediction of the latent state sequence.

Training

First initialize an HMM using:

char[] states = new char[]{'A', 'B', 'C'};
char[] observations = new char[]{'X', 'Y', 'Z'};
HMM hmm = HMMFactory.instance().hmm(states, observations, 1);

It is easier though to take the constructor that takes a single string only that contains the path to an XML-file.

String xmlFile = "/src/test/resources/hmm.xml";
HMM hmm = HMMFactory.instance().hmm(xmlFile);

Having the HMM initialized, training is done like this:

Map<String, String> states = new HashMap<>(){{
	put("s1", "ABCABC");
	put("s2", "ABCCCC");
}};
Map<String, String> observations = new HashMap<>(){{
	put("s1", "XYZYXZ");
	put("s2", "XYZYXZ");
}};
hmm.train(states, observations);

Take care that states and observations have the same keys and equally long values. You can write your trained HMM to a file using:

String outFile = "hmm.trained.xml";
hmm.writeHMM(outFile);

That is it!

Prediction

First initialize the HMM again:

String xmlFile = "/src/test/resources/hmm.trained.xml";
HMM hmm = HMMFactory.instance().hmm(xmlFile)

Make sure to use the hmm.trained.xml file containing your trained HMM. Then make a prediction using:

Map<String, String> observations = new HashMap<>(){{
	put("s1", "XYZYXZ");
	put("s2", "XYZYXZ");
}};
Map<String, String> pred = hmm.predict(states, observations);

Congrats! That concludes the tutorial on HMMs.

How to use PCA

TODO

Author