Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volcanosv-vc-large-indel.py fails #9

Open
DayTimeMouse opened this issue Nov 5, 2024 · 12 comments
Open

volcanosv-vc-large-indel.py fails #9

DayTimeMouse opened this issue Nov 5, 2024 · 12 comments

Comments

@DayTimeMouse
Copy link

Hi,

I ran volcanosv-vc-large-indel.py, but the results of chr* did not have volcano_variant_chr*vcf file.

log:

Traceback (most recent call last):
  File "/home/xgc/variant/sv/volcanosv/VolcanoSV/bin/VolcanoSV-vc/Large_INDEL//extract_contig_signature_Hifi.py", line 740, in <module>
    ref_seq = dc_ref[chr_name]
KeyError: 'chr1'
cat: large_indel_output_tumor/chr1/volcano_variant_chr*vcf: 没有那个文件或目录

script:

path_to_volcanosv='../VolcanoSV'
for i in {1..22}
do
echo "***********************large-indel for chr${i}*******************"
python3 ${path_to_volcanosv}/bin/VolcanoSV-vc/Large_INDEL/volcanosv-vc-large-indel.py \
-i volcanosv_asm_output_tumor \
-bam hifi_tumor.bam \
-o large_indel_output_tumor \
-ref genome.fa \
-t 20 \
-chr ${i} \
-dtype Hifi \
-px hifi_tumor
done

Best regards.

@volcano1998
Copy link
Collaborator

What is your reference file like? Does the contig have name in a format like chr1 chr2 etc?

@DayTimeMouse
Copy link
Author

Yes, genome is T2T-CHM13, the format is chr1, chr2...

@volcano1998
Copy link
Collaborator

Your error is a new one to me. Is it possible to send me a small BAM file (like chr22) and the contig for that chromosome as well? So I can reproduce the error in my end and try to debug.

@volcano1998
Copy link
Collaborator

Does your reference file have some descriptive fields after the chromosome name? For example
>chr1 xxxxxxx
format like that.

@DayTimeMouse
Copy link
Author

Yes, after I simplied chr id, volcanosv-vc-large-indel.py works well.
But ValueError: too many values to unpack (expected 2) in volcanosv-vc-small-indel.py is still exist.

The small BAM file (like chr22) size is huge, ~1.16G, how can I send to you?

@volcano1998
Copy link
Collaborator

volcano1998 commented Nov 20, 2024

I'm glad to hear that the first error is resolved.

Is it possible to upload to a cloud drive (google drive or anything similar) and share the link to me please?

I need the BAM, contig and reference file to test. Thanks!

@DayTimeMouse
Copy link
Author

Hi, I have uploaded bam file and contig to google drive(https://drive.google.com/drive/folders/1w6pn5bIiiNIWmafqqLCwgqod06h3jZjM?usp=drive_link).

Thanks in advance.

@volcano1998
Copy link
Collaborator

volcano1998 commented Nov 21, 2024

You are very welcome! I also need the assembled contig file generated by VolcanoSV-asm, which should be under

volcanosv-asm_output/chr22/final_contigs/${prefix}_final_contigs.fa

Could you also provide that please? Thanks!

@DayTimeMouse
Copy link
Author

Hi,

I uploaded the contig file now in the same link. Please have a check.

@volcano1998
Copy link
Collaborator

Hi @DayTimeMouse I was able to generate a VCF with ~1600 variants. I had to manually modify some scripts since you gave me a single chromosome reference file.

I think the problem might be the reference splitting. Can you check if you have a file in large_indel_output_tumor/chr1/ref_by_chr/chr1.fa ? And how big it is ?

@DayTimeMouse
Copy link
Author

Hi volcano1998,

I'm so sorry for that. Now I uploaded the full reference file (https://drive.google.com/drive/folders/1w6pn5bIiiNIWmafqqLCwgqod06h3jZjM?usp=drive_link).

large_indel_output_tumor/ref_by_chr/chr1.fa size is 239.74 MB, and large_indel_output_tumor/ref_by_chr/chr22.fa size is 49.56MB.

@volcano1998
Copy link
Collaborator

volcano1998 commented Dec 3, 2024

Thank you! I used your full reference file and got the VCF without changing my code. I ran it on chr22 and got a VCF of size 3.1M in 3 mins.

I also noticed that your file structure is the older version of VolcanoSV. I suggest you to try git pull and run it on chr22 again.

Here is my command.

python3 ${path_to_volcanosv}/bin/VolcanoSV-vc/Large_INDEL/volcanosv-vc-large-indel.py \
	-i ../volcanosv_asm_output/ \
	-o volcanosv_large_indel_output/ \
	-dtype Hifi \
	-bam ../chr22.bam \
	-ref ../chm13.id.simply.fa  \
	-chr 22 -t 10 \
	-px hifi_tumor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants