Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a CLI method to determine if a JSON context is "bioregistry conformant" #490

Open
matentzn opened this issue Aug 2, 2022 · 5 comments · May be fixed by #1096
Open

Add a CLI method to determine if a JSON context is "bioregistry conformant" #490

matentzn opened this issue Aug 2, 2022 · 5 comments · May be fixed by #1096

Comments

@matentzn
Copy link
Collaborator

matentzn commented Aug 2, 2022

It would be super cool if we could build our own contexts and validate them against bioregistry as part of our CI. I think this would significantly encourage standardisation. I would use this for all projects..

The method should be able to specify a bioregistry managed context by name, and take as an input another context that is validated. Why not use the bioregistry validated context directly? It is likely we will want to keep cached subsets contexts of the huge and growing contexts in bioregistry (specifying maybe 10 prefixes).

Maybe its a dumb idea but I would use it.

@cthoyt
Copy link
Member

cthoyt commented Aug 2, 2022

sure, better as a CLI or a python function?

In #416 I started creating utilities for validating/working with data in pandas dataframes as well, so this is a nice future direction for the package.

@matentzn
Copy link
Collaborator Author

matentzn commented Aug 2, 2022

Ultimately it should be a CLI function, as I would want to weave this into CI pipelines based on shell commands and make. I am really having command line users in mind that do not necessarily want to dork with pything scripts.

cthoyt added a commit that referenced this issue Aug 2, 2022
@cthoyt
Copy link
Member

cthoyt commented Aug 2, 2022

so like who are your target people in mind and how will they feel about most of their stuff being wrong? E.g., I ran bioregistry validate jsonld "https://raw.githubusercontent.com/prefixcommons/prefixcommons-py/master/prefixcommons/registry/go_context.jsonld" --relax and got this output:

BIOMD - nonstandard > Switch to standard prefix: biomodels.db
COG_Function - invalid
WB - nonstandard > Switch to standard prefix: wormbase
FBbt - nonstandard > Switch to standard prefix: fbbt
KEGG_LIGAND - nonstandard > Switch to standard prefix: kegg.ligand
PSO_GIT - invalid
MaizeGDB_stock - invalid
EMAPA - nonstandard > Switch to standard prefix: emapa
GO - nonstandard > Switch to standard prefix: go
NCBI_GP - invalid
NMPDR - invalid
CASSPC - nonstandard > Switch to standard prefix: casspc
TGD_REF - invalid
NCBIGene - nonstandard > Switch to standard prefix: ncbigene
KEGG_REACTION - nonstandard > Switch to standard prefix: kegg.reaction
PseudoCAP - invalid
UniPathway - nonstandard > Switch to standard prefix: upa
MEROPS_fam - invalid
GO_REF - nonstandard > Switch to standard prefix: go.ref
VEGA - nonstandard > Switch to standard prefix: vega
ZFIN - nonstandard > Switch to standard prefix: zfin
AspGD_REF - invalid
RO - nonstandard > Switch to standard prefix: ro
Pfam - nonstandard > Switch to standard prefix: pfam
UBERON - nonstandard > Switch to standard prefix: uberon
GR - invalid
PDB - nonstandard > Switch to standard prefix: pdb
CORIELL - nonstandard > Switch to standard prefix: coriell
JCVI_GenProp - invalid
SGN - nonstandard > Switch to standard prefix: sgn
BFO - nonstandard > Switch to standard prefix: bfo
Genesys-pgr - invalid
UniMod - nonstandard > Switch to standard prefix: unimod
UM-BBD_reactionID - nonstandard > Switch to standard prefix: umbbd.reaction
PubChem_Substance - nonstandard > Switch to standard prefix: pubchem.substance
EcoCyc - nonstandard > Switch to standard prefix: ecocyc
Reactome - nonstandard > Switch to standard prefix: reactome
InterPro - nonstandard > Switch to standard prefix: interpro
UniRule - nonstandard > Switch to standard prefix: unirule
MGCSC_GENETIC_STOCKS - invalid
dictyBase - nonstandard > Switch to standard prefix: dictybase
PO_GIT - invalid
AspGD_LOCUS - nonstandard > Switch to standard prefix: aspgd.locus
SGD - nonstandard > Switch to standard prefix: sgd
COG_Pathway - nonstandard > Switch to standard prefix: cog.pathway
ENZYME - invalid
PAMGO_MGG - invalid
AgBase - invalid
AraCyc - invalid
EcoCyc_REF - invalid
CHEBI - nonstandard > Switch to standard prefix: chebi
HGNC - nonstandard > Switch to standard prefix: hgnc
dictyBase_gene_name - invalid
TAIR - invalid
EnsemblFungi - nonstandard > Switch to standard prefix: ensembl.fungi
Wikipedia - nonstandard > Switch to standard prefix: wikipedia.en
SUPERFAMILY - invalid
SWALL - invalid
PSI-MOD - nonstandard > Switch to standard prefix: mod
FYPO - nonstandard > Switch to standard prefix: fypo
RGD - nonstandard > Switch to standard prefix: rgd
UM-BBD_enzymeID - nonstandard > Switch to standard prefix: umbbd.enzyme
Broad_MGG - invalid
Swiss-Prot - nonstandard > Switch to standard prefix: uniprot
PMID - nonstandard > Switch to standard prefix: pubmed
Xenbase - nonstandard > Switch to standard prefix: xenbase
PR - nonstandard > Switch to standard prefix: pr
MIPS_funcat - invalid
GR_REF - invalid
MaizeGDB - nonstandard > Switch to standard prefix: maizegdb.locus
HAMAP - nonstandard > Switch to standard prefix: hamap
SGN_ref - invalid
TO_GIT - invalid
MeSH - nonstandard > Switch to standard prefix: mesh
GR_PROTEIN - nonstandard > Switch to standard prefix: gramene.protein
MaizeGDB_REF - invalid
GEO - nonstandard > Switch to standard prefix: geo
PO - nonstandard > Switch to standard prefix: po
PomBase - nonstandard > Switch to standard prefix: pombase
ENA - nonstandard > Switch to standard prefix: ena.embl
PIRSF - nonstandard > Switch to standard prefix: pirsf
EMBL - invalid
Prosite - nonstandard > Switch to standard prefix: prosite
H-invDB_cDNA - invalid
EC - nonstandard > Switch to standard prefix: eccode
MACSC_REF - invalid
PAMGO_VMD - invalid
IRGC - invalid
NASC_code - invalid
COG_Cluster - nonstandard > Switch to standard prefix: cog
TreeGenes - invalid
WB_REF - nonstandard > Switch to standard prefix: wormbase
TGD_LOCUS - invalid
MA - nonstandard > Switch to standard prefix: ma
UniProtKB - nonstandard > Switch to standard prefix: uniprot
MGI - nonstandard > Switch to standard prefix: mgi
GRINDesc - invalid
DDANAT - nonstandard > Switch to standard prefix: ddanat
RAP-DB - invalid
gomodel - nonstandard > Switch to standard prefix: go.model
KEGG_PATHWAY - nonstandard > Switch to standard prefix: kegg.pathway
BTO - nonstandard > Switch to standard prefix: bto
JCVI_CMR - invalid
dictyBase_REF - invalid
DOI - nonstandard > Switch to standard prefix: doi
LIFEdb - invalid
PANTHER - invalid
Gene3D - invalid
PATRIC - invalid
FB - nonstandard > Switch to standard prefix: flybase
PAINT_REF - invalid
CASREF - invalid
ENSEMBL - nonstandard > Switch to standard prefix: ensembl
SMART - nonstandard > Switch to standard prefix: smart
RefSeq - nonstandard > Switch to standard prefix: refseq
WBls - nonstandard > Switch to standard prefix: wbls
MaizeGDB_QTL - invalid
SOY_ref - invalid
ECO - nonstandard > Switch to standard prefix: eco
CGD_REF - invalid
ECK - invalid
CGD - nonstandard > Switch to standard prefix: cgd
GR_GENE - nonstandard > Switch to standard prefix: gramene.gene
RNAmods - nonstandard > Switch to standard prefix: rnamods
KEGG_ENZYME - nonstandard > Switch to standard prefix: kegg.enzyme
CACAO - invalid
IUPHAR_GPCR - nonstandard > Switch to standard prefix: iuphar.receptor
JCVI_TIGRFAMS - invalid
SOY_QTL - invalid
DDBJ - invalid
PRINTS - nonstandard > Switch to standard prefix: prints
PO_REF - invalid
IMG - invalid
CL - nonstandard > Switch to standard prefix: cl
UniProtKB-SubCell - nonstandard > Switch to standard prefix: uniprot.location
NIF_Subcellular - nonstandard > Switch to standard prefix: nlx.sub
GeneDB - nonstandard > Switch to standard prefix: genedb
ApiDB_PlasmoDB - nonstandard > Switch to standard prefix: plasmodb
RNAcentral - nonstandard > Switch to standard prefix: rnacentral
CGD_LOCUS - invalid
Rfam - nonstandard > Switch to standard prefix: rfam
Broad_NEUROSPORA - invalid
AGI_LocusCode - invalid
OBO_SF2_PO - invalid
FMA - nonstandard > Switch to standard prefix: fma
CDD - nonstandard > Switch to standard prefix: cdd
PubChem_Compound - nonstandard > Switch to standard prefix: pubchem.compound
HGNC_gene - invalid
PharmGKB - invalid
VMD - invalid
UniParc - nonstandard > Switch to standard prefix: uniparc
MEROPS - invalid
GDB - invalid
SEED - nonstandard > Switch to standard prefix: seed
SO - nonstandard > Switch to standard prefix: so
Soy_gene - invalid
CORUM - nonstandard > Switch to standard prefix: corum
RHEA - nonstandard > Switch to standard prefix: rhea
dbSNP - nonstandard > Switch to standard prefix: dbsnp
MaizeGDB_Locus - nonstandard > Switch to standard prefix: maizegdb.locus
MO - nonstandard > Switch to standard prefix: mo
PLANA_REF - invalid
ISBN - nonstandard > Switch to standard prefix: isbn
BRENDA - nonstandard > Switch to standard prefix: brenda
ASAP - nonstandard > Switch to standard prefix: asap
CAS - nonstandard > Switch to standard prefix: cas
H-invDB_locus - invalid
UM-BBD_ruleID - nonstandard > Switch to standard prefix: umbbd.rule
NCBITaxon - nonstandard > Switch to standard prefix: ncbitaxon
ComplexPortal - nonstandard > Switch to standard prefix: complexportal
JSTOR - nonstandard > Switch to standard prefix: jstor
GRIMS - invalid
PATO - nonstandard > Switch to standard prefix: pato
GR_QTL - nonstandard > Switch to standard prefix: gramene.qtl
ECOGENE - nonstandard > Switch to standard prefix: ecogene
HPA_antibody - invalid
VBRC - nonstandard > Switch to standard prefix: vbrc
EO_GIT - invalid
EchoBASE - nonstandard > Switch to standard prefix: echobase
CASGEN - invalid
IUPHAR_RECEPTOR - nonstandard > Switch to standard prefix: iuphar.receptor
IRIC - invalid
GenBank - nonstandard > Switch to standard prefix: genbank
TGD - nonstandard > Switch to standard prefix: tgd
JCVI_EGAD - invalid
PubChem_BioAssay - nonstandard > Switch to standard prefix: pubchem.bioassay
TC - nonstandard > Switch to standard prefix: tcdb
SABIO-RK - nonstandard > Switch to standard prefix: sabiork.reaction
OBO_SF2_PECO - invalid
MetaCyc - nonstandard > Switch to standard prefix: metacyc.compound
PAMGO_GAT - invalid
ModBase - invalid
OMIM - nonstandard > Switch to standard prefix: omim
GR_MUT - invalid
HPA - nonstandard > Switch to standard prefix: hpa
IntAct - nonstandard > Switch to standard prefix: intact
ProDom - nonstandard > Switch to standard prefix: prodom
GRIN - invalid
WBPhenotype - nonstandard > Switch to standard prefix: wbphenotype
BioCyc - nonstandard > Switch to standard prefix: biocyc
ENSEMBL_GeneID - invalid
PIR - invalid
UniProtKB-KW - nonstandard > Switch to standard prefix: uniprot.keyword
Planteome_gene - invalid
AspGD - invalid
JCVI_Medtr - invalid
EuPathDB - invalid
PMCID - nonstandard > Switch to standard prefix: pmc

@matentzn
Copy link
Collaborator Author

matentzn commented Aug 2, 2022 via email

@cthoyt
Copy link
Member

cthoyt commented Aug 2, 2022

okay I will think about how this might work, since it would still be nice to make suggestions (but like you said, this should only support contexts registered in Bioregistry that are loved and cared for)

@cthoyt cthoyt linked a pull request Apr 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants