Skip to content

Organizing Data

athpud edited this page Jan 23, 2019 · 13 revisions

Part 1: Getting Data

If you haven't already, follow the Installation Instructions to set up your account on Lonestar and install necessary software. **Remember, there are things that you will have needed to add to your .bash directory that are FAT specific before continuing onward to this tutorial; so check the Installation page again, especially the 'Editing the PATH' section! **

ssh into the virtual login node on Lonestar (replace $username with your TACC username):

ssh -Y $username@vlogin03.ls5.tacc.utexas.edu

If you don't have access to the virtual login node, you can also use an idev session. idev is a program to claim a compute node for a fixed amount of time, and then ssh into that node. It allows you to test out code or run more computationally intensive programs like fslview. If you try to run something like fslview on a login node (accessed at ls5.tacc.utexas.edu), the process may be killed, and you might be temporarily banned from Lonestar. Be careful, and when in doubt, use the virtual login node or idev. If you just run idev with no arguments, you will be a job that will last up to 30 minutes, and that will run on one of the development nodes. You can use the -t option to request more time. For example, to get two hours: idev -t 02:00:00.

If you have already saved your participant data in Corral, for effective use of space move your data into your WORK directory. Copy the data from Corral to your WORK directory:

cp -r /corral-repl/utexas/prestonlab/preproc/part1 $WORK/preproc
cd $WORK/preproc

Take a look at the data using ls:

ls bender_03
ls bender_03/raw/bender_03
ls bender_03/raw/bender_03/2

Each subject was run over two days. For bender_03, the day one data are in the bender_03 directory, and day 2 is in the bender_03a directory. The raw data for each day are placed in a $subject/raw/$subject directory within the main study directory. The raw data for each subject are placed in multiple directories. Each directory corresponds to one scan, which includes many DICOM (.dcm) files. DICOM files have a lot of information, but the format isn't very standardized. Most neuroimaging software packages work with NIfTI files, so our first step is to convert the DICOM files to NIfTI format.

Setting up a Study

Many scripts in the pipeline use some environment variables so you don't have to keep specifying the same things over and over again. These variables are set up in your .bashrc script. The .bashrc script is a hidden script in your $HOME directory.

The most important variable you'll edit in your .bashrc is $STUDYDIR. This is a base directory with one directory for each subject:

$STUDYDIR/
  $subject1
  $subject2
  ...
  $subjectN

To define the study directory, run this:

export STUDYDIR=$WORK/preproc

As long as this is run before running a given script, it will work. If you put the above line in your $HOME/.bashrc file, it will run as soon as you log into Lonestar, so you won't have to think about it. You can edit your .bashrc file by navigating to your $HOME directory and then using a program like emacs; i.e. in the command line from your home directory you would write: emacs .bahsrc

Part 2: Converting to NIfTI

We'll use the script convert_dicom.py to convert the DICOM files and create a standard directory structure for each subject. To see how to run convert_dicom.py, look at the help comments:

convert_dicom.py -h

Some notes on the options:

  • There is only one required input, subject. This is the name of the subject directory that you want to process (e.g. bender_03 or bender_03a).
  • Many of the python scripts include a --study-dir option. If the STUDYDIR environment variable has been defined, it will be used, and you don't need to specify that option.
  • Many scripts include a --dry-run option. If you specify that, the script will print out the commands that will be run, so you can double-check them before actually running anything.
  • The scripts will generally write a log file with output from the script. There will be a new log file (with a time stamp in the filename) each time you run the script. To remove old log files, use --clean-logs.

Let's see what commands the script would run for one subject, without running them yet:

convert_dicom.py bender_03 --dry-run

You should see a series of commands starting with dcm2nii. They are calling the dcm2nii program to do the actual conversions.

Take out the --dry-run flag to actually run the conversions:

convert_dicom.py bender_03

Use ls to look at the new directories and files that were created:

ls bender_03/*

This will show the contents of each directory under bender_03. Look at the log file that was created:

less bender_03/logs/dcm2nii*.log

less is a program for looking at text files. Hit the spacebar to go down a page, and backspace to go up a page. You can also search for text; type "/", then a search term (try searching for "dicom"). Type "q" to quit.

Repeat running convert_dicom.py for the other subject directories (bender_03a, bender_04, bender_04a), or look at the results on Corral under /corral-repl/utexas/prestonlab/preproc/part2.

Getting Image information

You can use the lsvol utility to quickly get information about a set of image files:

lsvol bender_03/anatomy

outputs:

256x256x192x001 1.00x1.00x1.00x1.90  11M mprages002a1001.nii.gz
192x256x256x001 1.00x1.00x1.00x1.90  11M omprages002a1001.nii.gz
167x205x200x001 1.00x1.00x1.00x1.90 7.1M comprages002a1001.nii.gz

Each rows displays information about one image, with the dimensions, resolution, and disk space used for each.

Viewing Images

Take a look at the files using fslview. On the virtual login, type: fslview & to run fslview in the background (i.e. without tying up your terminal). Choose File>Open, then navigate to bender_03/anatomy/comprages002a1001.nii.gz. This high-resolution MPRAGE scan is used for registering functional scans to anatomy. Note there are three versions of the MPRAGE; the first is the raw image, the one with an "o" prefix has been changed to standard orientation, and the one with a "co" prefix has also been cropped.

Close the MPRAGE scan, and open the functional scan under bender_03/BOLD/functionalprexs009a001.nii.gz. Note how the prefrontal cortex is distorted due to proximity to the sinuses. Press the filmstrip button on the toolbar to play through the scan like a movie.

Look at the images under bender_03/fieldmap. There should be three images. The first two are called magnitude images, and the third is a phase image. Look at them using fslview (when you have multiple images in the same space, you can overlay them using File>Add). We'll use these images to unwarp the functional data.

Part 3: Organizing Data

There are two parts involved in organizing data. The first part is the same for each study, and involves renaming some files to standard names. The second part has to be customized for each study; for two-day studies, this part should also merge together the separate days so there is just one directory for each subject.

Test out rename_nifti.py to see what will be run:

rename_nifti.py bender_03 --dry-run

Then remove the --dry-run run option to actually run it. It will rename some files to make them more standardized. Run the same command on the other subject directories.

Once rename_nifti.py has been run, you can archive the raw files on Ranch. See Archiving data for details.

Next, run bender_clean_subj.py to merge images from the two days into one subject directory:

bender_clean_subj.py bender_03

This does a lot of things, but some highlights:

  • renames high-res anatomical scans to highres1, highres2, etc.
  • renames coronal scans to coronal1, coronal2, etc.
  • renames fieldmaps to identify the magnitude and phase images (fieldmap_magX and fieldmap_phaseX), and assigns a number (X=1,2,etc.) to each fieldmap scan
  • renames functional scans directories to a shorter name (e.g. study_1, study_2 for runs 1 and 2 of the study task)
  • places all images under bender_03, and deletes the bender_03a directory

This process varies somewhat for each study, but if you follow the guidelines above you should be ready for the rest of the pipeline.

Run bender_clean_subj.py for bender_04 also. See /corral-repl/utexas/prestonlab/preproc/part3 to see how your $STUDYDIR should look after the data reorganization steps.

Next: Basic BOLD Preprocessing