Skip to content

Appendix 1. Data management and input formats

Zhiwen Owen Jiang edited this page Nov 25, 2024 · 4 revisions

This panel introduces basic options and input formats for HEIG.

  • --keep <filename1>,<filename2>

--keep is to specify one or multiple space/tab-delimited text files with subject IDs. Multiple files should be separated by comma. Files can be either with or without header line, and can have multiple columns, but the first two columns must be FID and IID. HEIG only reads the first two columns and ignores others. Each row must contain only one subject. Any missing values in FID or IID will cause the subject unmatched. An example of valid files:

1000016 1000016 1001.0                                                                                              
1000021 1000021 1001.0
1000033 1000033 1001.0
1000049 1000049 1001.0
1000057 1000057 1003.0
1000065 1000065 1001.0
1000070 1000070 1001.0
  • --extract <filename1>,<filename2>

--extract is to specify one or multiple space/tab-delimited text files with SNPs. Multiple files should be separated by comma. Files can be either with or without header line, and can have multiple columns, but the first column must be rsID. HEIG only reads the first column and ignores others. Each row must contain only one SNP. An example of valid files:

rs11780869                                                                                                          
rs2003497
rs10488368
rs2240379
rs62486593
rs2906332
rs2906345
rs11780918
rs79144312
rs3008295
  • --covar <filename>

--covar is to specify a space/tab-delimited text file of covariates. The file must have a header line with the first two columns being FID and IID. Duplicated column names are not allowed. Each row must contain only one subject. Missing values can be coded as -9, NONE, ., or NA, but should not be blank ''. The file can be compressed with suffix .gz or .bz2. An example of valid files:

FID IID age sex imaging_site    pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9 pc10                                                
s1000   s1000   59  1   b   -0.4010502469896386 -1.3977556383405239 -1.1989531741168893 -0.31168971927976885    0.3192321995265252  0.9685769022718186  1.8326018231975845  -0.6346419945322742 0.3363193064868467  -0.7897244424911196
s1001   s1001   37  0   b   0.9264363086140831  -0.27419658635267524    0.7720616431978489  -1.6609459265813038 1.4199694045561828  0.6980516781108909  0.2603967905056178  -0.5853880525290345 0.6598444399232275  -1.0292578428699457
s1002   s1002   34  1   c   0.701290712014738   -1.6114486830556236 -0.5339808046015635 -0.14223683717340252    -1.2387417547696444 -0.42870989766847506    -0.39095050016692584    -0.14013039543097625    1.655583437068254   -0.0877827797351612
s1003   s1003   52  0   b   0.4251658934030198  0.1943423086619841  1.4042613582921233  -0.31488874333272   -0.2757343537383393 0.5170793182458275  -2.05910573260717   -2.179613181756659  1.137622453984647   -0.2108538438748659
s1004   s1004   37  1   a   1.9270090938740552  -1.239621811338056  0.7251010142973133  -0.306660392310748  0.662229511906797   -0.5731246697489935 -1.0593592473148528 -1.0654126679534426 0.03879940627823722 -0.7651002224780677
s1005   s1005   39  0   b   1.3772395767781787  -0.14630633814274918    -0.14149150349734654    -2.2291734997078247 0.8388072020175267  0.33683973329181    0.21085921723693568 -0.15190656038994882    -1.126017629044099  0.5840360816778098
  • --image-txt <filename>

--image-txt is to specify a space/tab-delimited text file of imaging data. The file format requirements are the same as --covar.

  • --coord-txt <filename>

--image-txt is to specify a space/tab-delimited text file of image coordinate data. Header line is not allowed. Each row is the coordinate of a voxel/vertex. The number of rows should be equal to image resolution. The number of columns should be equal to the image dimension. An example of valid files:

67  123 91                                                                                                         
67  124 91
67  124 92
67  125 91
67  125 92
67  126 91
67  126 92
67  127 91
  • --voxel

--voxel is to specify a subset of voxels in voxel-level GWAS. Voxels should be one-based (the first one is 1). It can be provided as a single number --voxel 3, or multiple numbers --voxel {1:10}, which means voxel 1 to 10 (included), or a file --voxel <filename>. No header line is allowed. Each row must contain only one voxel. An example of valid files:

3
5
7
9