Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomeInfoDb #11

Open
teacakedeadlift opened this issue Mar 6, 2023 · 5 comments
Open

GenomeInfoDb #11

teacakedeadlift opened this issue Mar 6, 2023 · 5 comments

Comments

@teacakedeadlift
Copy link

IchorCNA suddenly decided it was going to error with the following:

Error in .order_seqlevels(chrom_sizes[, "chrom"]) : 
  !anyNA(m32) is not TRUE
Calls: getSeqInfo ... FETCH_ORDERED_CHROM_SIZES -> .order_seqlevels -> stopifnot
Execution halted

Likely as a result of the issues with GenomeInfoDb.

Just wondering if anyone else has problems or if there is a workaround?

Old conda environments broken, new ones won't download GenomeInfoDB.
Doesn't seem to want to work with the 0.5.0 docker image either (which I assumed wouldn't require downloads from bioconductor). I'm at a loss as to what to do.

Any help appreciated

Phil

@ekushele
Copy link

ekushele commented Mar 7, 2023

Also posted this here: broadinstitute#120 (comment)

@gavinha
Copy link
Member

gavinha commented Mar 10, 2023

Hi @teacakedeadlift

Let us look into this.

Can you please post your sessionInfo() so that we can replicate it?

Thanks,
Gavin

@teacakedeadlift
Copy link
Author

hi @gavinha

Sorry for the late reply.

I couldn't run sessionInfo() as I was running ichorCNA either via the snakemake pipeline inside a conda env or just by execution of the Rscript unIchorCNA.R within the conda env. Turns out the snakemake pipeline was running to completion but with errors in the log files, and the conda env itself was having issues with all of the following packages bioconductor-genomeinfodb bioconductor-genomeinfodbdata bioconductor-bsgenome.hsapiens.ucsc.hg19 bioconductor-bsgenome.hsapiens.ucsc.hg38. The issue is with these packages not ichorCNA. Various workarounds haven't yet succeeded for me (including clean conda env installs, using mamba, manual alteration of the build number as per here. I seem to be able to install locally within R using BiocManager::install("GenomeInfoDb") but have yet to trial using ichorCNA within R.

I have managed to get the docker container from https://quay.io/repository/biocontainers/r-ichorcna?tab=tags&tag= working with apptainer on our cluster. Code included here in case anyone is in a similar position:

Using apptainer pull docker://quay.io/biocontainers/r-ichorcna:0.5.0--pl5321r42hdfd78af_0, renaming to r-ichorcna_0.5.0.sif and then submitting a job with the following from within the scripts folder of ichorCNA (wigs already created):

cd /data/home/hfx252/tools/tmp/ichorCNA/scripts

patientID=patientID
sampleID=sampleID
controlID=$patientID.control
mkdir -p ./$JOB_ID/$patientID
mkdir -p ./$JOB_ID/$patientID/logs

apptainer exec /path/to/sif/r-ichorcna_0.5.0.sif Rscript runIchorCNA.R \
 --libdir ./.. \
 --WIG /path/to/wigs/${patientID}.${sampleID}.wig \
 --NORMWIG /path/to/wigs/${controlID}.wig \
 --normalPanel ../inst/extdata/HD_ULP_PoN_hg38_1Mb_median_normAutosome_median.rds \
 --id $patientID.$sampleID \
 --libdir ../ \
 --gcWig ../inst/extdata/gc_hg38_1000kb.wig \
 --mapWig ../inst/extdata/map_hg38_1000kb.wig \
 --centromere ../inst/extdata/GRCh38.GCA_000001405.2_centromere_acen.txt \
 --genomeBuild hg38 \
 --genomeStyle UCSC \
 --minMapScore 0.7 \
 --ploidy "c(2)" --normal "c(.95,.99,.999)" --maxCN 5 \
 --includeHOMD False \
 --chrs "paste0('chr', c(1:22, \"X\"))" \
 --chrTrain "paste0('chr', c(1:22, \"X\"))" \
 --chrNormalize "paste0('chr', c(1:22))" \
 --estimateNormal True --estimatePloidy True --estimateScPrevalence True --scStates "c(1,3)" \
 --exons.bed NULL \
 --txnE 0.99999 --txnStrength 100000 \
 --fracReadsInChrYForMale 0.001 \
 --maxFracGenomeSubclone 0.5 --maxFracCNASubclone 0.7 \
 --plotFileType pdf --plotYLim "c(-2,4)" \
 --outDir ./$JOB_ID/$patientID > ./$JOB_ID/$patientID/logs/$patientID.$sampleID.log 2> ./$JOB_ID/$patientID/logs/$patientID.$sampleID.log

So far seems to be working although having some issues with:

  1. chr19 being reduced to a couple fo bins around the centromere by the provided hg38 PoN
  2. spurious bins in some chromosomes that are filtered out when we use the provided hg38 PoN, but not when we have created our own (~30 samples)
  3. TFx called as >0.03 in some samples when using 500kb bins but then these disappearing when using 1000kb.

Unsure if the first 2 are due to mappability filtering although the mapwig for chr19 looks like it includes most of chromosome and our samples have coverage in chr19. Currently trialling runs with hg19 in case that is an issue.
The last issue might be something to do with the most likely solution being selected, and the starting normal contamination settings / ploidy settings (--estimateScPrevalence usually set to FALSE).

Do you run ichorCNA from within R, on the command line, locally or on a cluster? What would you suggest is best practice?

many thanks

Phil

@ekushele
Copy link

Hi @teacakedeadlift

Let us look into this.

Can you please post your sessionInfo() so that we can replicate it?

Thanks, Gavin

@gavinha, here is my sessionInfo for singularity image: r-ichorcna-0.3.2--pl5321r42hdfd78af_2:

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] ichorCNA_0.3.2       GenomicRanges_1.50.0 GenomeInfoDb_1.34.1
[4] IRanges_2.32.0       S4Vectors_0.36.0     BiocGenerics_0.44.0
[7] HMMcopy_1.40.0       data.table_1.14.4    optparse_1.7.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9             bitops_1.0-7           plyr_1.8.7
 [4] zlibbioc_1.44.0        XVector_0.38.0         getopt_1.20.3
 [7] tools_4.2.2            RCurl_1.98-1.9         compiler_4.2.2
[10] GenomeInfoDbData_1.2.9

I tried to run newer version with singularity image r-ichorcna:0.5.0--pl5321r42hdfd78af_0 (with ichorCNA as function) but I also have issues there: #12 (comment)

@ycl6
Copy link

ycl6 commented Sep 25, 2023

Hi all,

The error is due to a bug in GenomeInfoDb when handling hg38 assembly, see Bioconductor/GenomeInfoDb#86 (comment).

Error in .order_seqlevels(chrom_sizes[, "chrom"]) :
!anyNA(m32) is not TRUE

The fixed version (v1.34.8) is available in BioC 3.16 (R-4.2). Hence, if you are not using R-4.2, you'll need to upgrade R and install ichorCNA with the latest versions of the all the R dependencies. If you are already using R-4.2, you can upgrade existing packages by update.packages() and BiocManager::install().

These are the latest version of package imported by ichorCNA in R-4.2:

  HMMcopy (>= 1.40),
  GenomicRanges (>= 1.50.2),
  GenomeInfoDb (>= 1.34.9),
  doMC (>= 1.3.8),
  foreach (>= 1.5.2),
  BSgenome.Hsapiens.UCSC.hg19 (>= 1.4.3),
  BSgenome.Hsapiens.UCSC.hg38 (>= 1.4.5),
  ggplot2 (>= 3.4.3),
  stringr (>= 1.5.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants