-
Notifications
You must be signed in to change notification settings - Fork 1
Appendix 1. Data management and input formats
--keep <filename1>,<filename2>
--keep
is to keep subjects from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. Only subjects included in all files are kept (logical and). Files can be either with or without header line, and can have multiple columns, but the first two columns represent FID
and IID
. HEIG only reads the first two columns and ignores others. Each row must contain only one subject. Any missing values in FID
or IID
will cause the subject unmatched. An example of valid files:
1000016 1000016 1101.0
1000021 1000021 1021.0
1000033 1000033 1021.0
1000049 1000049 1000.0
1000057 1000057 1005.0
1000065 1000065 1298.0
1000070 1000070 1020.0
--remove <filename1>,<filename2>
--remove
is to remove subjects from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. Subjects included in any files are removed (logical or). Files can be either with or without header line, and can have multiple columns, but the first two columns represent FID
and IID
. HEIG only reads the first two columns and ignores others. Each row must contain only one subject. Any missing values in FID
or IID
will cause the subject unmatched. The file format is the same as that for --keep
.
--extract <filename1>,<filename2>
--extract
is to extract SNPs from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. Only SNPs included in all files are kept (logical and). Files can be either with or without header line, and can have multiple columns, but the first column must represent rsID
. HEIG only reads the first column and ignores others. Each row must contain only one SNP. An example of valid files:
rs11780869
rs2003497
rs10488368
rs2240379
rs62486593
rs2906332
rs2906345
rs11780918
rs79144312
rs3008295
--exclude <filename1>,<filename2>
--exclude
is to exclude SNPs from one or multiple space/tab-delimited text files. Multiple files should be separated by comma. SNPs included in any files are excluded (logical or). Files can be either with or without header line, and can have multiple columns, but the first column must represent rsID
. HEIG only reads the first column and ignores others. Each row must contain only one SNP. The file format is the same as that for --extract
.
--covar <filename>
--covar
is to specify a space/tab-delimited text file of covariates. The file must have a header line with the first two columns being FID
and IID
. Duplicated column names are not allowed. Each row must contain only one subject. Missing values can be coded as -9
, NONE
, .
, or NA
, but should not be blank ''
. The file can be compressed with suffix .gz
or .bz2
. An example of valid files:
FID IID age sex imaging_site pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9 pc10
s1000 s1000 59 1 b -0.4010502469896386 -1.3977556383405239 -1.1989531741168893 -0.31168971927976885 0.3192321995265252 0.9685769022718186 1.8326018231975845 -0.6346419945322742 0.3363193064868467 -0.7897244424911196
s1001 s1001 37 0 b 0.9264363086140831 -0.27419658635267524 0.7720616431978489 -1.6609459265813038 1.4199694045561828 0.6980516781108909 0.2603967905056178 -0.5853880525290345 0.6598444399232275 -1.0292578428699457
s1002 s1002 34 1 c 0.701290712014738 -1.6114486830556236 -0.5339808046015635 -0.14223683717340252 -1.2387417547696444 -0.42870989766847506 -0.39095050016692584 -0.14013039543097625 1.655583437068254 -0.0877827797351612
s1003 s1003 52 0 b 0.4251658934030198 0.1943423086619841 1.4042613582921233 -0.31488874333272 -0.2757343537383393 0.5170793182458275 -2.05910573260717 -2.179613181756659 1.137622453984647 -0.2108538438748659
s1004 s1004 37 1 a 1.9270090938740552 -1.239621811338056 0.7251010142973133 -0.306660392310748 0.662229511906797 -0.5731246697489935 -1.0593592473148528 -1.0654126679534426 0.03879940627823722 -0.7651002224780677
s1005 s1005 39 0 b 1.3772395767781787 -0.14630633814274918 -0.14149150349734654 -2.2291734997078247 0.8388072020175267 0.33683973329181 0.21085921723693568 -0.15190656038994882 -1.126017629044099 0.5840360816778098
--image-txt <filename>
--image-txt
is to specify a space/tab-delimited text file of imaging data or non-imaging phenotype data. Subjects with constant values across all voxels are removed. The other file format requirements are the same as --covar
.
--coord-txt <filename>
--coord-txt
is to specify a space/tab-delimited text file of image coordinate data. Header line is not allowed. Each row is the coordinate of a voxel/vertex. The number of rows should be equal to image resolution. The number of columns should be equal to the image dimension. An example of valid files:
67 123 91
67 124 91
67 124 92
67 125 91
67 125 92
67 126 91
67 126 92
67 127 91
--voxel
--voxel
is to specify a subset of voxels in Voxel‐level GWAS reconstruction. Voxels should be one-based (the first one is 1
). It can be provided as a single number --voxel 3
, or multiple numbers by --voxel 11,12,13
or --voxel {11:13}
, which means voxel 11 to 13 (included), or a file --voxel <filename>
. No header line is allowed. Each row must contain only one voxel. An example of valid files:
3
5
7
9
--partition
--partition
is to specify a text file of genome partition. The file should be tab or space delimited without header with the first column being chromosome, the second column being the start position, and the third column being the end position. Each row contains only one LD block. We have provided partition files for GRCh37/38 and for subjects of European, African, and Asian ancestry link. An example of valid files:
1 10583 1892607
1 1892607 3582736
1 3582736 4380811
1 4380811 5913893
1 5913893 7247335
1 7247335 9365199
1 9365199 10806984