Phecomp Get Started

About Phecomp data analysis tool.

Phecomp data analysis scripts quick guide

by Isabel Fernández

Raw data files obtained with Phecomp cages have .mtb extension:

Requirements

To use this tools it is required to have installed a Python interpreter on your machine.

The HMM processing scripts will also require the HMM extension module for Python installed on your machine. You can found at the following here.

In this guide it is assumed that you have downloaded Phecomp script from this SVN repository, like: svn co http://tcoffee.googlecode.com/svn/phecomp/trunk

and unzipped the contained sample data named sample.zip like the following:

{{{ unzip /data/sample.zip }}}

Steps to pre-process the data

Keep only the useful data, eg. Food intake information (f option) and keep it in a second by second acquisition format. Data is splitted into 2 output files, one for each type of cages (only chow food, SC, and chow+choc ,CD).

python ./bin/raw2matrix.py f 20090302.mtb

Output files:

20090302_CD_f.mtx
20090302_SC_f.mtx

Convert the data into eating amounts (current acquired value is compared to next acquired value, and keep the difference) keeping second by second acquisition format.

python ./bin/matrix2increment.py a 20090302_CD_f.mtx

Using option a (all), both _SC and CD files are processed at time.

Output files:

20090302_SC_f_inc.mtx
20090302_CD_f_inc.mtx

If all eating amounts are considered, use matrix2increment_all.py .

If only eating events are useful, convert it to binary (1, eat event, 0 no eat event) using matrix2binary.py .

Keep the time intervals between each eating event and join all the files that compose the data set. Intervals are converted into its corresponding bin (interval distribution is divided into bins such as each bin contains roughly the same number of elements).

Using option m (merge) joins all entries of the same cage in a single sequence.

python ./bin/joindata_interval.py m 20090302_SC_f_inc.mtx [<another_file_SC_f_inc.mtx>, ...]

Output files:

20090302_SC_f_inc_m_int.jnd (contains interval sequences)
20090302_SC_f_inc_m_bin.jnd (contains bin sequences)

All files that are to be joint must be listed after the option (m in the example)

Use joindata_interval.py to filters out big intervals.

If input data comes from CD cages and split between chow and choc food is needed, use joindata_interval_choc_chow.py. This will add an extra line per cage with the eating amounts and the eating type (e.g. 0.02c, for choc, 0.02w, for chow).

If no filtering is desired use joindata_interval_multiple_all.py .

If input data is in eating amount format use joindata.py .

If input data is in eating amount format and compression of non-eating periods is desired, use joindata_compress.py.

To obtain both the interval and eating amount information use joindata_getincrement_and_interval.py .

To obtain only the eating amount information, without any time reference use joindata_getincrement.py .

Train the HMM using the pre-processed data

Once the data is already pre-processed, it is ready for training the model. To train a 2 states model. The training is done with file containing the bins:

python ./bin/hmmtrain_homogeneous.py 2 20090302_SC_f_inc_m_bin.jnd

Output files:

20090302_SC_f_inc_m_bin_2states.hmm

This uses all input sequences in .jnd file and trains a 2 states default model using Baum-Welch algorithm. This script uses scaling variables in the computation of the model in order to avoid underflow.

The version without scaling variables is also available, although it is not recommended unless no hmmtrain_homogeneous_noscaling.py, underflow is assured.

In case that the input data has labels (states) available, use hmmtrain_labeled.py .

Decode data using an HMM model

Once the model is trained it can be used to decode other sequences. Since the training data is in 'bin' format, the data to be decoded must also be in this format.

python ./bin/hmmpath.py 20090302_SC_f_inc_m_bin_2states.hmm 20090724_SC_f_inc_m_bin.jnd

Output files:

20090724_CD_f_inc_m_bin.pth

If instead of training with Baum-Welch, posterior decoding is preferred:

hmmpath_posterior.py 20090302_SC_f_inc_m_bin_2states.hmm 20090724_SC_f_inc_m_bin.jnd

This second option provides with a file with the posterior probabilities for each state and a path file, similar to the one obtained with Baum-Welch, that shows the state corresponding to the states that own the higher probability in posterior decoding.

Output files:

20090724_CD_f_inc_m_bin.pdg (posterior probabilities)
20090724_CD_f_inc_m_bin_posterior.pth (posterior path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phecomp Get Started

About Phecomp data analysis tool.

Phecomp data analysis scripts quick guide

Requirements

Steps to pre-process the data

Train the HMM using the pre-processed data

Decode data using an HMM model

Clone this wiki locally