Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 to Apollo Annotations #16

Open
curtisim0 opened this issue Jun 7, 2024 · 2 comments
Open

GFF3 to Apollo Annotations #16

curtisim0 opened this issue Jun 7, 2024 · 2 comments
Assignees

Comments

@curtisim0
Copy link
Contributor

  • stub *
@jasonjgill
Copy link

OK here are the notes I made on this in the summer when I tried to get this to work:

GFF3 to Apollo Annotations DOES WORK, but looks like it only takes the perfect GFF3 model that Apollo produces. Possible to get prokka (or other tool output, like gene or tRNA callers) output to the same state?

  • NB the tool fails if the given name or disable CDS recalculation switches are set to yes.
  • Tried current cpt fix gene model tool, add gene to CDS and prep for Apollo tools, they don’t produce complete GFF3 gene models. The Apollo tool will import annotations without the exon but they don’t display correctly in Apollo.

Possible to modify one of these tools to take an initial single-feature GFF3 (e.g., CDS or gene) and build a complete GFF3 from that??:

  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_gff_add_parents/edu.tamu.cpt.gff3.cdsParents/19.1.0.0
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_fix_sixpack/edu.tamu.cpt.gff3.fixsixpack/19.1.0.0
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_prep_for_apollo/edu.tamu.cpt.gff3.prepForApollo/20.8.0.0

Current tool flow for getorfs output (which is also CDS only)

  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_gff_add_parents/edu.tamu.cpt.gff3.cdsParents/19.1.0.0
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_req_phage_start/edu.tamu.cpt.gff3.require_phage_start/19.1.0.0
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_shinefind/edu.tamu.cpt.genbank.shinefind/21.1.0.0
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_remove_annotations/edu.tamu.cpt.gff3.remove_annots/19.1.0.1
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_fix_sixpack/edu.tamu.cpt.gff3.fixsixpack/19.1.0.0
  • toolshed.g2.bx.psu.edu/repos/cpt/cpt_prep_for_apollo/edu.tamu.cpt.gff3.prepForApollo/20.8.0.0

Running tests using Milagro v10 record with new Pharokka run to see if import can go

  • Workflow is complete steps for GetORFs entry into Apollo as a track, will run on Pharokka output which is a similar-looking CDS-only GFF format
  • Completed testing. The GFF3 generated for use as a track (GetORFs) does not have an mRNA feature and is parent-childed differently than the native Apollo GFF3. If I modified the file to the correct format it would import the feature. The modified Pharokka output also ends up with 2 copies of each CDS per gene, not sure how that happens.

The native Apollo GFF3 that imports properly is set up as follows:
gene ID=gene
mRNA ID=mRNA;parent=gene
exon ID=exon;parent=mRNA
CDS ID=CDS;parent=mRNA
Shine_Dalgarno_sequence ID=SDseq;parent=mRNA
NB: the ID can be any text but must be unique, parents must point as shown

  • A test file with mRNA, CDS, exon as children of the gene will import but the gene is not formed properly in Apollo, Apollo does not seem to recognize it as protein-coding.
  • NOTE: Apollo will strip all of the pharokka-specific, non-standard fields in column 9 (phrog, top_hit, etc.)
  • Apollo displays the name from the mRNA field, import appends a numerical suffix
  • Minimum for import is ID and Parent fields, will take “Name”. Note these are CASE SENSITIVE.

@curtisim0
Copy link
Contributor Author

There is some conversion hunting between "track" and "promotion" to the "user annotation" panel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants