Skip to content

Data Analysis Tutorial

Zach Werkhoven edited this page Mar 28, 2019 · 2 revisions

Contents

  1. Tutorial Overview
  2. Convert Interframe Interval to Time Elapsed
  3. Generate Speed Raw Data
  4. Split Data by Experimental Group

Tutorial Overview

The Basic Tracking protocol outputs raw data for centroids coordinates, time stamps, frames dropped for each trace to binary data files (.bin) by default (see output options for outputting additional raw data types) and a custom ExperimentData object containing experiment meta data to a matlab data file (.mat). The primary aim of this tutorial is to demonstrate some of the basics of accessing and manipulating MARGO's data outputs. We will accomplish this by doing a walkthrough of three example tasks with broad applicability to a variety of analyses:

  1. Converting the raw time data from interframe interval format to time elapsed
  2. Generating new raw data for speed from centroid coordinates and time data
  3. Calculating average speed of each sex over the duration of the experiment

Converting interframe interval to time elapsed

In addition to containing experiment meta data, the ExperimentData object also contains file path references to the raw binary data and methods for accessing the raw data conveniently. See ExperimentData for additional details. Let's start by loading the ExperimentData into the MATLAB workspace and using it to access the raw time data.

1. Browse to the save directory chosen during setup, and the new autogenerated save directory (see below). MARGO autogenerates a unique labeling string for each tracking experiment with the following:

  • a time stamp from the start of tracking in MM-DD-YYYY-hh-mm-ss format
  • the name of the experiment protocol (e.g. Basic Tracking)
  • the first row of label data

Using this labeling string MARGO generates a directory with the following structure:

user_selected_save_directory
|
└── autogenerated_parent_directory
    |   ExperimentData.mat
    |
    └── autogenerated_raw_data_directory
            centroid.bin
            dropped_frames.bin
            time.bin

In this particular case, on this machine the directory has the structure/naming shown below, but is dependent on the save directory chosen, the time of the experiment, and the configuration of the labels:

./margo_data
|
└── 03-22-2019-13-13-41__Basic_Tracking_CantonS_F_23C_1-46_Day1
    |   03-22-2019-13-13-41__Basic_Tracking_CantonS_F_23C_1-46_Day1.mat
    |
    └── raw_data
            03-22-2019-13-13-41__centroid.bin
            03-22-2019-13-13-41__dropped_frames.bin
            03-22-2019-13-13-41__time.bin

2. Open the file ExperimentData file ( 03-22-2019-13-13-41__Basic_Tracking_CantonS_F_23C_1-46_Day1.mat in the example above) in MATLAB. This should appear in the MATLAB workspace as expmt.

3. References to the binary raw data files are coordinated through two additional custom classes: RawDataField and RawDataMap. The RawDataField contains meta data and methods for attaching, resetting, and detaching the RawDataMap (this will be explained shortly). The RawDataMap contains methods for directly reading from the raw data file. Inspect the time RawDataMap by executing expmt.data.time in the MATLAB command line:

>> expmt.data.time

ans = 

  2710×1 RawDataField array with properties:

          raw: [2710×1 RawDataMap]
         path: 'C:/Users/deBivortLab/Documents/MATLAB/margo_data/...
          fID: 4
          dim: [2710 1]
    precision: 'single'

The output of the time RawDataField contains important meta data about the raw time data file, that tells the RawDataMap how to read data from the file:

  • path - complete file path
  • fID - a pointer to the file
  • dim - the dimensions of the data contained in the file (num_frames x 1)
  • precision - the format of the data (single floating point precision)

Most importantly, the raw property contains the RawDataMap itself.

4. (optional) The dimensions listed in the preview of this [2710×1 RawDataMap] indicate that the reference to the raw data file is in tact and is accurately initialized, meaning that the RawDataMap is "attached". We can also verify that the map is accurate attached with the following command:

>> isattached(expmt.data.time)

ans =

  logical

   1

5. (optional) For practice, try "detaching" the RawDataMap with the command below and inspect the output. This will close the reference to the raw data file, causing the raw property read [0×1 RawDataMap].

>> detach(expmt.data.time)

ans = 

  2710×1 RawDataField array with properties:

          raw: [0×1 RawDataMap]
         path: 'C:/Users/deBivortLab/Documents/MATLAB/margo_data/...
          fID: 4
          dim: [2710 1]
    precision: 'single'

Detaching is particularly useful for clearing system memory that gradually becomes tied up as data is read from the file. To quickly free up the data in memory and reopen the file to continue reading, we can detach and re-attach the data with the following command:

% automatically detach and then re-attach the data file
reset(expmt.data.time)

6. We can index directly into the RawDataMap like an array. To consistently record high precision time stamps, MARGO records outputs time data as the interframe interval in seconds (i.e. the time elapsed between the previous frame and the current frame). Read all the data from the file and store it into a tempory variable the following command:

% read all the raw time data from file
interframe_interval = expmt.data.time.raw();

Specifying no index tells the RawDataMap to return the data in its native dimensions (2710 x 1). We could also return a subset of the elements by indexing into the map like any MATLAB array. For example, expmt.data.time.raw(1:10) returns the first ten time stamps. The use of parentheses in the above assignments is important as it tells the RawDataMap that this is an indexing operation. Dropping the parentheses would assign the handle to the RawDataMap object to our variable instead of the raw data itself.

7. Convert the format of the time stamps from interframe interval (sec) to time elapsed (sec) by calculating the cumulative sum:

% convert timestamps from ifi to time elapsed
t_elapsed = cumsum(interframe_interval);

Generate raw speed data

The goal of this section is to cover some examples of getting additional raw data types, either by creating them in post processing or flagging them as additional outputs. To demonstrate, we will use three different methods to generate raw (i.e. frame-to-frame, per-animal) speed data:

  1. Calculate manually from centroid coordinates and time data
  2. Automatically from the command line
  3. Automatically from output options menu in the GUI

Calculate manually

To calculate the speed of each animal in each frame, we need to compute the distance traveled each frame and the time elapsed between each frame. We can skip the latter part since we stored it as interframe_interval in the example above. To calculate the distance traveled each frame:

1. Read the all centroid data from the raw data file in its native dimensions and store it in a temporary variable:

% read all centroid data from file
centroids = expmt.data.centroid.raw();

Because we specified no indices in the above command, all the centroid data will be read and will be output in its native dimensions (i.e. M x 2 x N), where the first dimension M = number of frames, the second dimension contains the X and Y coordinates respectively, and the third dimension N = number of traces.

2. Calculate the frame-to-frame difference in X and Y coordinates for each trace. The change in position for the first frame cannot be known, so we assign the first frame as NaN for each animal:

% calculate the change in x and pad the first frame with NaNs
dx = squeeze(diff(centroids(:,1,:)));
dx = [NaN(1,size(dx,2)); dx];

% calculate the change in y and pad the first frame with NaNs
dy = squeeze(diff(centroids(:,2,:)));
dy = [NaN(1,size(dy,2)); dy];

The call to squeeze compresses the dimensions of output from [2710 x 1 x 46] to [2710 x 46].

3. Complete the speed calculation by calculating the distance traveled each frame and dividing each column (i.e. each individual distance trace) by the interframe interval:

    % calculate the change in x and pad the first frame with NaNs
    distance = sqrt(dx.^2 + dy.^2);
    speed = distance ./ repmat(interframe_interval, 1, size(dx,2));

Create speed file from command line

The above example demonstrates some of the basics of the format of the raw data and how to manipulate it. We can automatically generate a new binary raw data file from the command line with using the ExperimentData object and MARGO's autoDataProcess() function. See running analysis for additional details.

1. Inspect the ExperimentData post-processing options with the following command:

>> expmt.meta.options

ans = 

  struct with fields:

       disable: 1
    handedness: 0
         bouts: 0
     bootstrap: 0
       regress: 0
         slide: 0
    areathresh: 0
          save: 1
           raw: {}

The fields above contain a variety of post-processing options set from the output options menu in the GUI which we can also assign manually from the command line. In particular, the "raw" field takes a cell array of strings, each of which contains a name of a feature/raw data file to be generated. Because the list is currently empty, nothing features are flagged. The following features can have raw data files generated in post processing:

  • speed - the speed of each animal per frame
  • direction - heading direction of each animal per frame (-π to π, relative to positive x-axis)
  • radius - normalized radial position (0-1) of each animal within their ROI (i.e. radial polar coordinate)
  • theta - angular position of each animal within their ROI (i.e. angular polar coordinate)

2. Add speed to the list of raw data files to generate by editing the post-processing options:

% assign speed to raw data file generation list
expmt.meta.options.raw = {'speed'};

3. Create the new raw data file by executing:

% generate the new speed file
expmt = autoDataProcess(expmt);

This will produce the following notification "generating new speed raw data file" and will open a waitbar showing the progress of the post-processing.



4. Inspect the new data field in the command line:

>> expmt.data.speed

ans = 

  2710×46 RawDataField array with properties:

          raw: [2710×46 RawDataMap]
         path: 'C:/Users/deBivortLab/Documents/MATLAB/margo_data/...
          fID: 3
          dim: [2710 46]
    precision: 'single'

The raw data folder for the experiment now contains new raw data file:

./raw_data/03-22-2019-13-13-41__speed.bin

Create speed file from the GUI

The above example showed how to generate a raw speed data file from the command line. We can accomplish the same task prior to recording through the output options menu of the GUI.

1. Select Options > output:



2. By default, post-processing is disabled. Uncheck disable in the post-processing panel to enable automatic post-processing of the data after recording:



3. Check Speed under Centroid Trace Features to flag the speed raw data file to be automatically generated in post processing from centroid and time data:



4. (optional) Speed and other trace/blob features can also flagged to be output in real-time, removing the need to generate a speed data file in post-processing:



Calculating average speed of each sex

Using the label meta data assigned during experiment setup, we can calculate metrics from the data separated by experimental category:

1. Read the raw data from file and store it in a temporary variable:

% get raw speed data
speed = expmt.data.speed.raw();

2. Inspect the label meta data by querying it in the command line. Meta data for the first five ROIs are shown below:

>> expmt.meta.labels_table

ans =

  46×4 table

     Strain      Sex    ID    Day
    _________    ___    __    ___

    'CantonS'    'F'     1     1 
    'CantonS'    'F'     2     1 
    'CantonS'    'F'     3     1 
    'CantonS'    'F'     4     1 
    'CantonS'    'F'     5     1 

3. Generate a logical filter indicating which ROIs contained female flies:

% create logical filter for which traces belong to female flies
is_female = cat(1,expmt.meta.labels_table.Sex{:}) == 'F';
is_male = ~is_female;

4. Calculate the mean speed for each individual and, using the filter created above, calculate the average speed for each sex:

% calculate average speed for each fly
avg_speeds = nanmean(speed);

% calculate average speed for all females and all males separate
avg_female_speed = mean(avg_speeds(is_female));
avg_male_speed = mean(avg_speeds(is_male));