Skip to content

Appendix 1. Data management and input formats

Zhiwen Owen Jiang edited this page Nov 25, 2024 · 4 revisions
  • --keep <filename1>,<filename2>

--keep is to keep subjects from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. Only subjects included in all files are kept (logical and). Files can be either with or without header line, and can have multiple columns, but the first two columns represent FID and IID. HEIG only reads the first two columns and ignores others. Each row must contain only one subject. Any missing values in FID or IID will cause the subject unmatched. An example of valid files:

1000016 1000016 1101.0                                                                                              
1000021 1000021 1021.0
1000033 1000033 1021.0
1000049 1000049 1000.0
1000057 1000057 1005.0
1000065 1000065 1298.0
1000070 1000070 1020.0
  • --remove <filename1>,<filename2>

--remove is to remove subjects from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. Subjects included in any files are removed (logical or). Files can be either with or without header line, and can have multiple columns, but the first two columns represent FID and IID. HEIG only reads the first two columns and ignores others. Each row must contain only one subject. Any missing values in FID or IID will cause the subject unmatched. The file format is the same as that for --keep.

  • --extract <filename1>,<filename2>

--extract is to extract SNPs from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. Only SNPs included in all files are kept (logical and). Files can be either with or without header line, and can have multiple columns, but the first column must represent rsID. HEIG only reads the first column and ignores others. Each row must contain only one SNP. An example of valid files:

rs11780869                                                                                                          
rs2003497
rs10488368
rs2240379
rs62486593
rs2906332
rs2906345
rs11780918
rs79144312
rs3008295
  • --exclude <filename1>,<filename2>

--exclude is to exclude SNPs from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. SNPs included in any files are excluded (logical or). Files can be either with or without header line, and can have multiple columns, but the first column must represent rsID. HEIG only reads the first column and ignores others. Each row must contain only one SNP. The file format is the same as that for --extract.

  • --covar <filename>

--covar is to specify a space/tab-delimited text file of covariates. The file must have a header line with the first two columns being FID and IID. Duplicated column names are not allowed. Each row must contain only one subject. Missing values can be coded as -9, NONE, ., or NA, but should not be blank ''. The file can be compressed with suffix .gz or .bz2. An example of valid files:

FID IID age sex imaging_site    pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9 pc10                                                
s1000   s1000   59  1   b   -0.4010502469896386 -1.3977556383405239 -1.1989531741168893 -0.31168971927976885    0.3192321995265252  0.9685769022718186  1.8326018231975845  -0.6346419945322742 0.3363193064868467  -0.7897244424911196
s1001   s1001   37  0   b   0.9264363086140831  -0.27419658635267524    0.7720616431978489  -1.6609459265813038 1.4199694045561828  0.6980516781108909  0.2603967905056178  -0.5853880525290345 0.6598444399232275  -1.0292578428699457
s1002   s1002   34  1   c   0.701290712014738   -1.6114486830556236 -0.5339808046015635 -0.14223683717340252    -1.2387417547696444 -0.42870989766847506    -0.39095050016692584    -0.14013039543097625    1.655583437068254   -0.0877827797351612
s1003   s1003   52  0   b   0.4251658934030198  0.1943423086619841  1.4042613582921233  -0.31488874333272   -0.2757343537383393 0.5170793182458275  -2.05910573260717   -2.179613181756659  1.137622453984647   -0.2108538438748659
s1004   s1004   37  1   a   1.9270090938740552  -1.239621811338056  0.7251010142973133  -0.306660392310748  0.662229511906797   -0.5731246697489935 -1.0593592473148528 -1.0654126679534426 0.03879940627823722 -0.7651002224780677
s1005   s1005   39  0   b   1.3772395767781787  -0.14630633814274918    -0.14149150349734654    -2.2291734997078247 0.8388072020175267  0.33683973329181    0.21085921723693568 -0.15190656038994882    -1.126017629044099  0.5840360816778098
  • --image-txt <filename>

--image-txt is to specify a space/tab-delimited text file of imaging data or non-imaging phenotype data. Subjects with constant values across all voxels are removed. The other file format requirements are the same as --covar.

  • --coord-txt <filename>

--coord-txt is to specify a space/tab-delimited text file of image coordinate data. Header line is not allowed. Each row is the coordinate of a voxel/vertex. The number of rows should be equal to image resolution. The number of columns should be equal to the image dimension. An example of valid files:

67  123 91                                                                                                         
67  124 91
67  124 92
67  125 91
67  125 92
67  126 91
67  126 92
67  127 91
  • --voxel

--voxel is to specify a subset of voxels in Voxel‐level GWAS reconstruction. Voxels should be one-based (the first one is 1). It can be provided as a single number --voxel 3, or multiple numbers by --voxel 11,12,13 or --voxel {11:13}, which means voxel 11 to 13 (included), or a file --voxel <filename>. No header line is allowed. Each row must contain only one voxel. An example of valid files:

3
5
7
9
  • --partition

--partition is to specify a text file of genome partition. The file should be tab or space delimited without header with the first column being chromosome, the second column being the start position, and the third column being the end position. Each row contains only one LD block. We have provided partition files for GRCh37/38 and for subjects of European, African, and Asian ancestry link. An example of valid files:

1	10583	1892607
1	1892607	3582736
1	3582736	4380811
1	4380811	5913893
1	5913893	7247335
1	7247335	9365199
1	9365199	10806984