-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate reports on differences in two gpad2.0 files #540
Comments
Tagging @ukemi @vanaukenk |
But realize that annotations_in != annotations_out for all groups. In some cases, incoming annotations will be split or deepened depending on the final procedure for creating GPADs from Noctua. For example: If MGI has an annotation to organ development_results in development of lung, I believe currently this will be deepened to lung development. If an MGI annotation has two pipe-delimited extensions, it will be split into two separate annotations. We need to talk about what to do with pipe-delimited 'with' fields. A better comparison, but way harder to do would be to be sure that the incoming GPAD file is semantically equivalent to the outgoing GPAD. |
Note another gpaddiff tool was developed: https://github.com/geneontology/gocamgen/tree/master/gpaddiff (thanks for the pointer @dustine32!) |
The current iteration of this tool, compares at the file level and attempts to compare at the semantic annotation level as well. It will be good to go over the results in an import meeting so we can see if its on the right track! :) |
@sierra-moxon This is really coming along! I'm running it again and having a bit of fun. Minor question: one of the group_by_column arguments is "evidence_code"; it this actually mapping back to evidence codes, or is it just evidence (which I think might make more sense in a world with GPADs)? I would also advocate for a "cli" or "machine" output mode for those interested in using the results in automated processes (raises hand) and quick exploration of differences. It would be more actual results and less "reporting" (the counts report may be usable for this), so it would be easier to pipeline into grep or jq; it would also be nice to select one of the outputs for STDOUT (fitting with a lot of what I do). |
The text was updated successfully, but these errors were encountered: