Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not creating DNA dictionary/protein dictionary from the reference annotation and error in miniprot #18

Open
ikkaku1005 opened this issue Jul 26, 2024 · 2 comments
Assignees

Comments

@ikkaku1005
Copy link

ikkaku1005 commented Jul 26, 2024

Hi,

Thank you for answering the question quickly and clearly. I am excited to apply LiftON to my project.
I want to map the de novo gene annotations to the closest species nanopore fly assembly.
I encountered a new issue: the log showed no transcripts and no proteins. Additionally, miniport displayed an error, but running Liftoff did not produce any errors.

Can you help me to understand why and resolve this issue? Thanks a lot!

Here is the log message

Creating reference annotation database:

Creating transcript DNA dictionary from the reference annotation ...
Creating transcript protein dictionary from the reference annotation ...

  • number of transcripts: 0
  • number of proteins: 0
    • number of truncated proteins: 0

miniprot analysis part:

Creating miniprot annotation database : ./lifton_output/miniprot/miniprot.gff3
2024-07-26 17:39:34,010 - INFO - Populating features
gffutils database build failed with No lines parsed -- was an empty file provided?
2024-07-26 17:39:34,355 - INFO - Populating features
gffutils database build failed with No lines parsed -- was an empty file provided?

Here is my code, and my gene annotation file format is GFF3.

ref="path/a_inornatus_100Kb_HiC_assembly_MAY_2021.fasta"
nano30="path/assembly.fasta"

lifton -g Ino.gff -o nano30.gff3 -copies -sc 0.95 $nano30 $ref

Here is the format of the header of the Ino.gff file:

##gff-version 3.1.26
Chr_1 AUGUSTUS transcript 1861 8031 . + . ID=Ino_00001.t1;gene_id=Ino_00001;
Chr_1 AUGUSTUS gene 1861 8031 . + . ID=Ino_00001;
Chr_1 AUGUSTUS exon 1861 1959 . + . ID=exon_1;Parent=Ino_00001.t1
Chr_1 AUGUSTUS exon 3661 3722 . + . ID=exon_2;Parent=Ino_00001.t1
Chr_1 AUGUSTUS exon 7780 7889 . + . ID=exon_3;Parent=Ino_00001.t1
Chr_1 AUGUSTUS exon 7980 8031 . + . ID=exon_4;Parent=Ino_00001.t1
Chr_1 AUGUSTUS transcript 53681 56137 . + . ID=Ino_00002.t1;gene_id=Ino_00002;
Chr_1 AUGUSTUS gene 53681 56137 . + . ID=Ino_00002;
Chr_1 AUGUSTUS exon 53681 53782 . + . ID=exon_5;Parent=Ino_00002.t1
Chr_1 AUGUSTUS exon 54661 54747 . + . ID=exon_6;Parent=Ino_00002.t1
Chr_1 AUGUSTUS exon 56130 56137 . + . ID=exon_7;Parent=Ino_00002.t1
Chr_1 AUGUSTUS transcript 60449 61874 . + . ID=Ino_00003.t1;gene_id=Ino_00003;eggnog_id="SH2_domain-containing_protein_5"
Chr_1 AUGUSTUS gene 60449 61874 . + . ID=Ino_00003;eggnog_id="SH2_domain-containing_protein_5"
Chr_1 AUGUSTUS exon 60449 60517 . + . ID=exon_8;Parent=Ino_00003.t1
Chr_1 AUGUSTUS exon 61663 61743 . + . ID=exon_9;Parent=Ino_00003.t1
Chr_1 AUGUSTUS exon 61824 61874 . + . ID=exon_10;Parent=Ino_00003.t1
Chr_1 AUGUSTUS transcript 63451 68293 . + . ID=Ino_00004.t1;gene_id=Ino_00004;blast_id="sp|Q6ZV89|SH2D5_HUMAN";interproscan_id="IPR036860";eggnog_id="SH2_domain_containing_5"
Chr_1 AUGUSTUS gene 63451 68293 . + . ID=Ino_00004;blast_id="sp|Q6ZV89|SH2D5_HUMAN";interproscan_id="IPR036860";eggnog_id="SH2_domain_containing_5"
Chr_1 AUGUSTUS exon 63451 63471 . + . ID=exon_11;Parent=Ino_00004.t1
Chr_1 AUGUSTUS exon 63696 63827 . + . ID=exon_12;Parent=Ino_00004.t1
Chr_1 AUGUSTUS exon 64493 64741 . + . ID=exon_13;Parent=Ino_00004.t1
Chr_1 AUGUSTUS exon 65417 65591 . + . ID=exon_14;Parent=Ino_00004.t1
Chr_1 AUGUSTUS exon 65876 65981 . + . ID=exon_15;Parent=Ino_00004.t1
Chr_1 AUGUSTUS exon 66781 66940 . + . ID=exon_16;Parent=Ino_00004.t1
Chr_1 AUGUSTUS exon 68072 68293 . + . ID=exon_17;Parent=Ino_00004.t1
Chr_1 AUGUSTUS transcript 70368 75638 . + . ID=Ino_00005.t1;gene_id=Ino_00005;blast_id="sp|P39656|OST48_HUMAN";interproscan_id="IPR005013";eggnog_id="protein_N-linked_glycosylation_via_asparagine"
Chr_1 AUGUSTUS gene 70368 75638 . + . ID=Ino_00005;blast_id="sp|P39656|OST48_HUMAN";interproscan_id="IPR005013";eggnog_id="protein_N-linked_glycosylation_via_asparagine"

@ikkaku1005 ikkaku1005 changed the title can not creating DNA dictionary/protein dictionary from the reference annotation can not creating DNA dictionary/protein dictionary from the reference annotation and error in miniprot Jul 26, 2024
@Kuanhao-Chao Kuanhao-Chao self-assigned this Aug 2, 2024
@nikostr
Copy link

nikostr commented Oct 23, 2024

I come across this issue in v1.0.5 as well.

@nikostr
Copy link

nikostr commented Oct 24, 2024

Okay, I did some troubleshooting and it turns out the issue was that the transcript features in my gff lacked a parent. Maybe try replacing gene_id with parent, or just adding parent to your transcripts @ikkaku1005 ? I also notice that the transcript lines come before their corresponding genes in the gff you posted - I'm not sure if that would be an issue for gffutils, but if it isn't enough to add the parent, maybe make sure the transcript features come after their corresponding gene features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants