Skip to content
Zhiwen Owen Jiang edited this page Nov 25, 2024 · 3 revisions

General

Q: Which imaging modalities can I use?

A: HEIG supports images in NIFTI, CIFTI2 and FreeSurfer morphometry data. Imaging modalities that can be represented by these formats can be handled by HEIG. We have tested surface-based data in CIFTI2, and structural images and diffusion images in NIFTI. However, more effects are needed for functional MRI time series.

Q: How many LDRs do I need?

A: First, selecting a number that no less than 80% of image variance is preserved in --fpca. Next, make sure the correlation between raw and reconstructed images greater than 0.85 in --make-ldr. In general preserving 80%-90% of imaging signals and ensuring the correlation between the raw and reconstructed images is 0.85-0.95 is recommended to balance bias, variance, and computational cost.

Q: Which software can I use for LDR GWAS?

A: You can do LDR GWAS internally in HEIG --gwas.

Q: I want to share my summary statistics, what is required?

A: Sharing the triplets - LDR summary statistics (.sumstats + .snpinfo), bases, and the variance-covariance matrix of LDRs - is the minimum requirement. We recommend to provide the effective number in a README file or share the eigenvalue file so that users can easily do correction for multiple hypothesis testing across voxels. Additionally, attaching image templates for visualization is always encouraged.

Q: What is the maximum image resolution?

A: We have tested that images with 59,412 vertices can be efficiently handled by HEIG, which is able to cover the entire brain. Higher resolution may cause out of memory issue. See "How much memory do I need?" below.

Q: My sample size is small (<10,000), can I use HEIG?

A: Yes, you can. However, there are some drawbacks of using small sample size. First, if sample size is less than image resolution, the effective number is always downward biased, which may cause false positives. Second, heritability and genetic correlation estimates will be unstable with large standard error, although voxel-level GWAS is unaffected.

Q: How much memory do I need?

A: The main memory bottlenecks are --fpca and --voxel-gwas. We have tested that for a imaging dataset with 59,412 vertices and 15,752 subjects, --fpca took around 25 GB of memory to estimate all 15,752 PCs. In another benchmark study, we used a dataset with 117,019 voxels and 19,040 subjects to estimate the top 5,000 PCs. It took 56 GB of memory, which was a little unexpected. If we increased to the top 6,346 PCs, it took 72 GB of memory. Currently, HEIG is not memory efficient for images with resolution greater than 100,000.

We have tested that to recover voxel-level summary statistics using LDR summary statistics including 6.6 million SNPs and 25 LDRs, --voxel-gwas took 3, 10, 20, and 22 GB of memory to recover 2, 100, 1000, and 15,000 voxels, respectively (4 CPUs in parallel). When using a larger LDR summary statistics dataset including 6.6 million SNPs and 1750 LDRs, it took 50 GB of memory to recover 10,000 voxels (4 CPUs in parallel).

Voxel-level GWAS

Q: After using HEIG to scan the whole image, should I do voxel GWAS individually for those of interest?

A: Suppose an investigator conducts whole-cortex voxel-level GWAS through HEIG, where 80% of image variance is preserved and the correlation between raw and reconstructed images is 0.85. The investigator identifies some voxels of interest. Instead of doing raw voxel-wise GWAS, the investigator can extract these voxels and do the second round of HEIG by preserving more image variance (e.g. 85%-90%) and achieving higher correlation between raw and reconstructed images (e.g. 0.90-0.95). We remind that in such cases one still needs to correct multiple hypothesis tests for all voxels analyzed in the first round.

However, we discourage preserving too much image variance (e.g. > 90%), because the local noise and biases can produce many suspicious loci where only one variant-voxel association is included in a locus. These loci are rarely reproducible. We propose a rule of theme for selecting LDRs: 80%-90% for image variance and 0.85-0.95 for correlation between raw and reconstructed images. We propose a post voxel-level GWAS screening step to remove such suspicious loci.

Q: Can I use HEIG to do efficient GWAS for a dataset of non-imaging phenotypes?

A: Yes, HEIG can handle non-imaging continuous phenotypes. In step Reading images, the dataset can be loaded using in text format through --image-txt. An additional coordinate file is required by default --coord-txt, however, it will not be used in analysis if we skip kernel smoothing --skip-smoothing in Functional PCA. In that case, FPCA reduces to PCA. The following steps for imaging and non-imaging data are the same.