-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/new formats #88
Feat/new formats #88
Conversation
find more specific function
Hi Ben, memote does not use the YAML file for anything. As you say it's only a format that can be diff'ed nicely. |
@Midnighter ok. Eventually we would also want cobrapy users to be able to change the model, so it would also be nice if this .yml file was generated by cobrapy in the same format. I tried to be as consistent as possible with the referenced cobrapy output, however we might expect some differences that we could handle with some wrapper :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I cannot comment on how it compares with cobrapy-derived yaml.
Uncertainty about how references are included, the pmid:
prefix is redundant, further discussion regarding this is happening here
@edkerk for the sake of this model I have removed the redundant |
I haven't looked at the RAVEN output yet but in cobrapy JSON and YAML share the same structure so the following JSON schema also applies to YAML. I can do some validation next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to what @Midnighter said, the YAML file you uploaded @BenjaSanchez seems to be missing a few required keys.
As a guideline, I'm going to refer both to the JSON schema and having converted iJO1366.xml to iJO1366.yml with the most recent version of cobrapy (ver 0.11.3):
- According to the JSON schema these top-level keys are required:
"required": ["id", "reactions", "metabolites", "genes"]
- In addition to that iJO1366 also contains:
name
andversion
. For instance:
!!omap
- metabolites: ***
- reactions: ***
- id: iJO1366
- name: Escherichia coli str. K-12 substr. MG1655
- compartments: !!omap ***
- version: '1'
-
According to the current SBML3FBC2 specifications parenthesis in the formula field are not allowed. This is not so much something to do with the YAML format per se, but something to keep in mind if you want to distribute formally valid SBML files. There is a notion to change this though, but I'm not sure how soon this change will happen.
Example formula that's not valid:(C6H10O5)n
Example formula that is valid:C20H21N7O7
SBML is very strict with this, and for instance memote won't run if a model has issues that conflict with SBML rules. @edkerk I've noticed that this is the case with the Yarrowia Model too. The formal representation of the metabolite dictionary in YAML seems to be fine though! -
As for reactions, looks good to me structurally. One side-thought I had when looking through the specification was that we might want to start using some ontology for subsystems and represent them as an ordered dict in the YAML (and internally) as to avoid having to parse a list of strings:
- subsystem:
- sce00620 Pyruvate metabolism
- sce00920 Sulfur metabolism
- compartments: Make sure to use an ordered dictionary there! (denoted by the YAML tag
!!omap
). I can see that you did sort it when writing the file but this sorting may or may not be lost when re-importing the YAML (which we don't recommend but some people/ software may take to doing that)
Compare iJO1366.yml:
- compartments: !!omap
- c: cytosol
- e: extracellular space
- p: periplasm
vs. yours:
- compartments:
- c: cytoplasm
- ce: cell envelope
- e: extracellular
- er: endoplasmic reticulum
- erm: endoplasmic reticulum membrane
- g: Golgi
- gm: Golgi membrane
- lp: lipid particle
- m: mitochondrion
- mm: mitochondrial membrane
- n: nucleus
- p: peroxisome
- v: vacuole
- vm: vacuolar membrane
That's all I can see from comparing the two with the naked eye. I hope it helps!
Thanks @ChristianLieven for the review! Answering to your comments:
The only one missing is
We have issue #19 to address that, hopefully work on it as soon as possible. @hongzhonglu could you maybe look into the ones remaining in curation/metabolites? I already fixed all tRNA's.
fixed in As the remaining changes might take longer, I think it's better if we open a specific branch for making sure that the .yaml of RAVEN matches the .yaml of cobrapy. In that branch we should also fix compatibility of the current .xml file (from COBRA toolbox) with cobrapy (I know from GECKO that additional compartment notations are appended to the current ids when loaded), otherwise cobrapy users will still not be able to contribute to the model. Thank you all for the input!! If no other things pop up I will merge on monday |
Added a .yml file of the model created by RAVEN @ChristianLieven @Midnighter please let us know if this format would be compliant with the cobrapy format of .yml file. Will memote use the .yml file in any way or is it only for better diffs?