Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make (and use) isoform promotion GORULE #2287

Open
kltm opened this issue Apr 4, 2024 · 5 comments
Open

Make (and use) isoform promotion GORULE #2287

kltm opened this issue Apr 4, 2024 · 5 comments
Assignees

Comments

@kltm
Copy link
Member

kltm commented Apr 4, 2024

As part of @sierra-moxon 's work, we explicitly made the choice to have the following data transform: when there is an isoform available in the GPI, use that as the GPAD bioentity, as it is the most specific (and can be recovered from the GPI).

We would like to make sure that this transform is well-documented with a GORULE (in addition to the mention in the spec), as it is an active transform of data.

Tagging @pgaudet

@cmungall
Copy link
Member

Can you expand on this? When exactly would this rule be applied?

I think I'm missing some context. The isoform ID should only be used if there is evidence for function in an isoform.

@kltm
Copy link
Member Author

kltm commented Apr 10, 2024

The GORULE portion is fungible, but I think it would be good to make explicit any transformations that are being done, if not immediately obvious (although, as pointed out elsewhere, this is implied in the spec).

IIRC, the decision was reached in a managers' call, which seems to be poorly documented ("MGI remainders" in):
https://wiki.geneontology.org/Projects_update_meeting_2024-03-13
https://wiki.geneontology.org/Projects_update_meeting_2024-02-28

I think the summary given by @sierra-moxon is best here: geneontology/gopreprocess#36

@pgaudet
Copy link
Contributor

pgaudet commented Apr 12, 2024

Hi @cmungall

It seems the documentation is not super clear. GPAD 2. doc is here: https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md

I had understood that since GPAD2.0 does not have the equivalent of Column 17, column 1 contained the 'active' entity (although now I don't see that written out clearly). My understanding was that GPAD had to be used in conjunction with the GPI, which specified the 'Parent_Protein' (GPI column 8) that corresponds to what is currently in GAF column 2. Otherwise, there is no way to capture isoform-specific data.

So, this is why @sierra-moxon took what was was in GAF column 17 and moved it to GPAD column 1. However I now realize that the GPI never describes the type of the 'Parent protein', which may results in entity type confusion.

Pascale

@cmungall
Copy link
Member

cmungall commented Apr 13, 2024 via email

@sierra-moxon
Copy link
Member

I believe this is complete; on snapshot now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants