Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LiftOn silently failing due to ID feature #22

Open
Glfrey opened this issue Aug 9, 2024 · 0 comments
Open

LiftOn silently failing due to ID feature #22

Glfrey opened this issue Aug 9, 2024 · 0 comments
Assignees

Comments

@Glfrey
Copy link

Glfrey commented Aug 9, 2024

Hi @Kuanhao-Chao ,

While running LiftOn for some genomes we noticed an ID feature in some gff3 files which causes LiftOn to silently fail. It occurs when the ID field of the mRNA ends with an underscore and integer (e.g. ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1). When corrected to an underscore, a string and an integer LiftOn runs successfully (e.g. ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_X1). I've put a full example below:

Uncorrected:

##gff-version   3
GCA_013396205.1-JAAOAN010000001.1       Genbank gene    653     1126    .       -       .                     ID=GCA_013396205.1-rna-gnl-WGS:JAAOAN-mrna.FMUND_1.gene
GCA_013396205.1-JAAOAN010000001.1       Genbank mRNA    653     1126    .       -       .                     ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1;Parent=GCA_013396205.1-rna-gnl-WGS:JAAOAN-mrna. FMUND_1.gene;Name=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1;ori_geneid=gene-FMUND_1
GCA_013396205.1-JAAOAN010000001.1       Genbank CDS     653     1126    .       -       0                     ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1.CDS1;Parent=GCA_013396205.1-transcript_rna-gnl- WGS:JAAOAN-mrna.FMUND_1
GCA_013396205.1-JAAOAN010000001.1       Genbank exon    653     1126    .       -       .                     ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1.exon1;Parent=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1

Corrected:

##gff-version   3
GCA_013396205.1-JAAOAN010000001.1       Genbank gene    653     1126    .       -       .                     ID=GCA_013396205.1-rna-gnl-WGS:JAAOAN-mrna.FMUND_1.gene
GCA_013396205.1-JAAOAN010000001.1       Genbank mRNA    653     1126    .       -       .                     ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_X1;Parent=GCA_013396205.1-rna-gnl-WGS:JAAOAN-mrna.FMUND_1.gene;Name=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1;ori_geneid=gene-FMUND_1
GCA_013396205.1-JAAOAN010000001.1       Genbank CDS     653     1126    .       -       0                     ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1.CDS1;Parent=GCA_013396205.1-transcript_rna-gnl- WGS:JAAOAN-mrna.FMUND_X1
GCA_013396205.1-JAAOAN010000001.1       Genbank exon    653     1126    .       -       .                     ID=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_1.exon1;Parent=GCA_013396205.1-transcript_rna-gnl-WGS:JAAOAN-mrna.FMUND_X1
###

When ran uncorrected, LiftOn appears to complete but the resulting gff3 file contains no "source=lifton" features, only miniprot. When corrected it contains both:

Uncorrected

grep -c "source=Liftoff"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3
0
grep -c "source=miniprot"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3
199
grep -c "status=miniprot"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3
199
grep -c "status=Liftoff"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3
0

Corrected

grep -c "source=Liftoff"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3
13790
grep -c "source=miniprot"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3 
199
grep -c "status=miniprot"  GCA_00289
4225.1_GCA_013396205.1_genomic_lifton.gff3
199
grep -c "status=Liftoff"  GCA_002894225.1_GCA_013396205.1_genomic_lifton.gff3
742

I was able to trace the issue to step 7 of LiftOn.py but I wasn't able to isolate the specific place it fails. My guess is it's something to do with how gffutils processes features during the chaining stage but I could be mistaken. The log files for both are similar but the run that fails terminates early (I've attached them).

out_LiftOn.Uncorrected.log

out_LiftOn.Corrected.log

I'm more than happy to share the data and commands we used. Probably best for me to ping over a dropbox link, let me know if that would be useful for you.

@Kuanhao-Chao Kuanhao-Chao self-assigned this Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants