Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change justification for Uniprot to Ensembl linkset #9

Open
stain opened this issue Oct 12, 2015 · 4 comments
Open

Change justification for Uniprot to Ensembl linkset #9

stain opened this issue Oct 12, 2015 · 4 comments
Assignees
Milestone

Comments

@stain
Copy link
Contributor

stain commented Oct 12, 2015

The linkset Uniprot-to-Ensembl has the justification SIO_000985 (protein coding gene)

Linkset: http://data.openphacts.org/dev/ims/linksets/uniprot/uniprot_ensembl.ttl.gz

ENSG IDs (as given as example) are not included in the linkset. Was this due to the one to many issue? 

All Ensembl entities included here seem to be transcripts. Is protein coding gene really the right justification here? Maybe SO:0000233 (mature transcript) would be better.

Note that there are also the opposite linksets from Ensembl to Uniprot from @JonathanMELIUS's linksets, which are detailed such as HomoSapiens, translation. I have not checked if there are overlaps here.

Here's the SPARQL query for the Uniprot-to-Ensembl linkset.

@stain stain self-assigned this Oct 12, 2015
@stain stain added this to the 2.1 milestone Oct 12, 2015
@stain
Copy link
Contributor Author

stain commented Oct 14, 2015

Need to evaluate if this linkset is even needed considering the opposite direction Ensembl-Uniprot linksets from Jonathan which have a probably stronger foundation than Uniprot's loose "see also" notation.

@stain stain assigned danidi and unassigned stain Oct 14, 2015
@Chris-Evelo
Copy link

I have suggested to remove it a few times in the past. It keeps popping up as a cause for inconsistent behaviour. Given the way we build other gene product related gene sets it doesn't make sense to have this. We only added it as a quick hack early on.

@nicklynch
Copy link

@Chris-Evelo So to summarise, we would remove this linkset Uniprot-to-Ensembl for human and non human? Is there value in keeping the non-human coverage? Do we risk missing data if we exclude the non-human?

@danidi
Copy link
Contributor

danidi commented Oct 19, 2015

My initial issue here was that this linkset contains only ensembl transcript IDs (starting with ENST), but no ensembl gene IDs (at least not for the human ones, I didn't check the other organisms).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants