Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_gff3 miscalculates CDS #2662

Closed
mpoelchau opened this issue Nov 28, 2023 · 4 comments
Closed

load_gff3 miscalculates CDS #2662

mpoelchau opened this issue Nov 28, 2023 · 4 comments

Comments

@mpoelchau
Copy link

Hi Apollo team,

We are trying to use the python-apollo arrow annotations load_gff3 command to load annotations to the user-created annotations track. It is changing the CDS locations of the model, both with and without the --disable_cds_recalculation option.

Here is what a load without --disable_cds_recalculation looks like; the correct frame can be seen in the track below.

Screenshot 2023-11-28 at 2 53 30 PM

The gff3 that was used to load the annotation has 6 CDS lines; the gff3 for the uploaded annotation has 12 CDS lines (even though the view shows only one CDS segment). Apollo also won't calculate a protein or CDS sequence on the uploaded annotation.

Here is what a load with --disable_cds_recalculation looks like (command: arrow annotations load_gff3 --source https://apollo2-stage-node1-cbo.nal.usda.gov/apollo Anoplophora_glabripennis ~/Downloads/NW_019416298.gff3 --disable_cds_recalculation)
Screenshot 2023-11-28 at 2 50 01 PM

Again, the gff3 for the uploaded annotation in Apollo has 12 CDS lines instead of 6. Apollo also won't calculate a protein or CDS sequence on the uploaded annotation.
I'll note that if you run the same command multiple times, the single CDS will display in a different spot each time.

If you load the annotation by dragging it up, it loads correctly:

Screenshot 2023-11-28 at 2 56 20 PM

This is happening for many (but not all) annotations in multiple assemblies/organisms.

Some other observations:

  • We haven't observed the problem for models with a single CDS/exon segment
  • The underlying genomic sequence has lowercase nucleotides
  • I tried using load_legacy_gff3, and that calculated the CDS correctly, but I'm unable to delete features when I load them with that method (Hibernate operation: could not execute statement; SQL [n/a]; ERROR: update or delete on table "feature" violates foreign key constraint "fk_8jm56covt0m7m0m191bc5jseh" on table "feature_relationship" Detail: Key (id)=(4858111) is still referenced from table "feature_relationship".; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "feature" violates foreign key constraint "fk_8jm56covt0m7m0m191bc5jseh" on table "feature_relationship" Detail: Key (id)=(4858111) is still referenced from table "feature_relationship".)

I've attached "before" and "after" gff3s. (Used .txt extension because GitHub wouldn't let me upload otherswise)
before.txt
after-nocdsrecalc.txt
after.txt

  • Provide the javascript console log output generated from the action.
    None.

  • Provide the server log output generated from the action (typically catalina.out).
    nothing is added to Catalina.out when I add the annotations.

@garrettjstevens
Copy link
Contributor

garrettjstevens commented Nov 28, 2023

Hi @mpoelchau. arrow is actually developed and maintained by a separate group, so you'll need to file an issue in that repository here: https://github.com/galaxy-genome-annotation/python-apollo.

@mpoelchau
Copy link
Author

Thanks @garrettjstevens, I'll repost there. I posted here because I assumed that the python method was essentially a wrapper around the apollo add_transcript method, but you know what they say about assumptions...

Is there an alternate method maintained by the Apollo group to load gff3s into the UcA?

@garrettjstevens
Copy link
Contributor

There's not an official method for loading GFF3s that we maintain. However, if there's some way to get the log from arrow of what requests are sent to the Apollo endpoint, I'd be happy to look at those and see if anything looks off.

@MonicaPoelchau-USDA
Copy link

Thanks @garrettjstevens ! I attached the output to a comment in the other issue: galaxy-genome-annotation/python-apollo#60 (comment). The only other output that the command issues is the following warning (it does this for all arrow commands):
/Users/mpoelchau/Documents/programs/apollo-arrow-env/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants