Limited valid data size of filtered BAM file in phased polyploid assembly #86

Axolotl233 · 2024-10-30T11:49:09Z

Hello,

Thank you for devoloping such a good tool, it is very useful for all community! I am working on a allotetraploidy plant (2n = 4x = AABB), which originated from a hybrid polyploid event involving two closely related species (A and B, which have different karyotype) and did not obviously diploidzated. Additionally, this plant is outcrossing and have a heterozygosity. Recently we sequenced this plant using Pacbio Hifi, illumina short reads, and Hi-C, and we want to construct a chromosome-level genome and performed genome analysis.

Both homogeneity and heterogeneity are exists in two subgenomes because of the close relationship between two progenitor genomes. It is means some genome regions have more similarity (autoploid like) than other genome region (alloploid like), which maked challenge in genome assembly. We finally assembled draft genome using hifiasm, and got alomst complete 4:1 genome collinearity when compared haplotype concentrated genome (hap1 + hap2) with one progenitor genome (we current only have one). But when the two haplotypes were separated and compared with the progenitor genome, they both showed redundancy or missing in some genome region, indicating that the phase was inaccurate. so I decide to use this concentrated genome for chromosome-anchor using HapHic.

However, I encountered the same problem as #21 , my filtered bamfile only have 3.8G size, compared with 117G of raw. This limited data caused a weird hic heatmap in quickview mode. my commads is

bwa index hic.hap12.fa
bwa mem -5SP -t 40 ./hic.hap12.fa  ../../../../z.data/hic_1.fq.gz   ../../../../z.data/hic_2.fq.gz | samblaster | samtools view - -@ 20 -S -h -b -F 3340 -o HiC.bam
/software/HapHiC/utils/filter_bam HiC.bam 1 --nm 3 --threads 14 | samtools view - -b -@ 14 -o HiC.filtered.bam
/software/HapHiC/haphic pipeline hic.hap12.fa HiC.filtered.bam 10 --quick_view --threads 30 --processes 30 --gfa "hic.hap1.p_ctg.gfa,hic.hap2.p_ctg.gfa"

You suggested in #21 to use P_utg data for analysis in this situation, but here I wondere if there have any possible to using concentrated genome (cat hap1 + hap2), maybe I can relax the filter conditions of bam filter step in some geonome regions, which have exactly the same sequence between two haplotype?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limited valid data size of filtered BAM file in phased polyploid assembly #86

Limited valid data size of filtered BAM file in phased polyploid assembly #86

Axolotl233 commented Oct 30, 2024

Limited valid data size of filtered BAM file in phased polyploid assembly #86

Limited valid data size of filtered BAM file in phased polyploid assembly #86

Comments

Axolotl233 commented Oct 30, 2024