Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load_gff3 #63

Open
RenanFerreira0412 opened this issue Apr 23, 2024 · 7 comments
Open

Can't load_gff3 #63

RenanFerreira0412 opened this issue Apr 23, 2024 · 7 comments

Comments

@RenanFerreira0412
Copy link

Hi everyone!

I’m using the arrow command arrow annotations load_gff3 to load a full GFF3 into an annotation track, but nothing's happening.

The version of my plugin:
apollo 4.2.13

Command:
arrow annotations load_gff3 [OPTIONS] ORGANISM GFF3

My command:
arrow annotations load_gff3 Leishmania /home/renanigor/Downloads/TriTrypDB-67_LdonovaniBPK282A1.gff

OBS: I’m using the docker to run the Apollo

My organism:
arrow organisms show_organism Leishmania
{
"commonName": "Leishmania",
"blatdb": "/data/temporary/apollo_data/34-Leishmania/seq/Leishmania.fa.2bit",
"metadata": "{"creator":"32"}",
"annotationCount": 2,
"currentOrganism": true,
"obsolete": false,
"sequences": 36,
"directory": "/data/temporary/apollo_data/34-Leishmania",
"publicMode": false,
"valid": true,
"genomeFastaIndex": "seq/Leishmania.fa.fai",
"genus": null,
"species": "donovani",
"id": 34,
"nonDefaultTranslationTable": null,
"genomeFasta": "seq/Leishmania.fa"
}

I really don’t know what the actual problem is because there are no error log messages.

When I run the command, the output is just empty braces.

(apollo_env) renanigor@pop-os:~/VirtualEnvs$ arrow annotations load_gff3 Leishmania /home/renanigor/Downloads/TriTrypDB-67_LdonovaniBPK282A1.gff
{}

Does anyone know how I can fix this?

@hexylena
Copy link
Member

Could you try with increased logging arrow --verbose -l debug annotations load_gff3? that'll give us more information as to why it's failing

@RenanFerreira0412
Copy link
Author

Now he's processing all the sequences from my GFF file, but when I refresh the Apollo page, the GFF file is not loaded into the annotation track.

apollo

The GFF file and the FASTA file with the sequence that I'm using can be found here:
https://tritrypdb.org/tritrypdb/app/downloads/Current_Release/LdonovaniBPK282A1/

OBS: My GFF file has 36 sequences.

The output was too big, so this is just the ending part of it.

.
.
.
DEBUG:root:unknown type protein_coding_gene
INFO:root:Processing Ld36_v01s1 with features: [SeqFeature(SimpleLocation(ExactPosition(1019), ExactPosition(1163), strand=-1), type='protein_coding_gene', id='LdBPK_360010.1', qualifiers=...), SeqFeature(SimpleLocation(ExactPosition(3957), ExactPosition(4260), strand=-1), type='protein_coding_gene', id='LdBPK_360020.1', qualifiers=...), SeqFeature(SimpleLocation(ExactPosition(6202), ExactPosition(6661), strand=-1), type='protein_coding_gene', id='LdBPK_360030.1', qualifiers=...), ...
.
.
.
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:unknown type protein_coding_gene
DEBUG:root:writing out: []
DEBUG:root:empty list, no more features to write
DEBUG:root:writing out: []
DEBUG:root:empty list, no more features to write
INFO:root:Finished loading
{}

@hexylena
Copy link
Member

hexylena commented Apr 23, 2024

what's your gff look like? I'm guessing it doesn't match our expected structure hence this result.

edit: ah you linked to it, ok, ill take a look when i can (apologies, not much spare time currently)

@hexylena
Copy link
Member

Looking at the gff it does follow roughly the expected model, with the change of protein_coding_gene rather than just gene.

Ld01_v01s1      VEuPathDB       protein_coding_gene     3662    4663    .       -       .       ID=LdBPK_010010.1;description=Protein of unknown function (DUF2946)%2C putative;ebi_biotype=protein_coding
Ld01_v01s1      VEuPathDB       mRNA    3662    4663    .       -       .       ID=LdBPK_010010.1.1;Parent=LdBPK_010010.1;description=Protein of unknown function (DUF2946)%2C putative;gene_ebi_biotype=protein_coding
Ld01_v01s1      VEuPathDB       exon    3662    4663    .       -       .       ID=exon_LdBPK_010010.1.1-E1;Parent=LdBPK_010010.1.1;gene_id=LdBPK_010010.1
Ld01_v01s1      VEuPathDB       CDS     3662    4663    .       -       0       ID=LdBPK_010010.1.1-p1-CDS1;Parent=LdBPK_010010.1.1;gene_id=LdBPK_010010.1;protein_source_id=LdBPK_010010.1.1-p1

it could be fixed either by changing protein_coding_gene to gene in your GFF file, or by updates to python-apollo.

https://github.com/GMOD/Apollo/blob/develop/client/apollo/js/SequenceOntologyUtils.js#L55 suggests that it's a valid feature as far as apollo is concerned, so likely we should expand to include some of these other terms (@abretaud what do you think), but until now we've been a bit cautious to only support structures we've seen before, lest this library cause any issues. It looks like ncRNA_gene is also used, so, clearly multiple top level features we've never seen before.

You can patch this yourself quickly by editing apollo/util.py to add your types to the gene_types list which may be faster than waiting on a new release of this library

@abretaud
Copy link
Member

Yeah we could support other top level feature types, no time to change the code for now, but feel free to propose a PR (or just modify the input gff to the expected gene type)

@RenanFerreira0412
Copy link
Author

Oh, I see. I tried adding the types in the apollo/util.py file as you suggested, and it worked.

He loaded all the features in the annotation track, but some of them were loaded with an exclamation mark.

tela1

I'm not sure why this happened.

These are the modifications I made in the apollo/util.py file.

tela2

tela3

Thanks for the help.

@abretaud
Copy link
Member

Questions marks only represent non-canonical splice sites: it's just a visual warning for curators in case they want to check carefully the splice site position

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants