Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/new formats #88

Merged
merged 8 commits into from
Apr 16, 2018
Merged

Feat/new formats #88

merged 8 commits into from
Apr 16, 2018

Conversation

BenjaSanchez
Copy link
Contributor

Added a .yml file of the model created by RAVEN @ChristianLieven @Midnighter please let us know if this format would be compliant with the cobrapy format of .yml file. Will memote use the .yml file in any way or is it only for better diffs?

@BenjaSanchez BenjaSanchez added the enhancement new field/feature label Apr 12, 2018
@BenjaSanchez BenjaSanchez self-assigned this Apr 12, 2018
@Midnighter
Copy link

Hi Ben, memote does not use the YAML file for anything. As you say it's only a format that can be diff'ed nicely.

@BenjaSanchez
Copy link
Contributor Author

BenjaSanchez commented Apr 12, 2018

@Midnighter ok. Eventually we would also want cobrapy users to be able to change the model, so it would also be nice if this .yml file was generated by cobrapy in the same format. I tried to be as consistent as possible with the referenced cobrapy output, however we might expect some differences that we could handle with some wrapper :)

Copy link
Member

@edkerk edkerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I cannot comment on how it compares with cobrapy-derived yaml.

Uncertainty about how references are included, the pmid: prefix is redundant, further discussion regarding this is happening here

@BenjaSanchez
Copy link
Contributor Author

@edkerk for the sake of this model I have removed the redundant pmid:. Let me know if any other issue is still present in this PR :)

@Midnighter
Copy link

I haven't looked at the RAVEN output yet but in cobrapy JSON and YAML share the same structure so the following JSON schema also applies to YAML. I can do some validation next week.

Copy link

@ChristianLieven ChristianLieven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to what @Midnighter said, the YAML file you uploaded @BenjaSanchez seems to be missing a few required keys.

As a guideline, I'm going to refer both to the JSON schema and having converted iJO1366.xml to iJO1366.yml with the most recent version of cobrapy (ver 0.11.3):

  • According to the JSON schema these top-level keys are required:
"required": ["id", "reactions", "metabolites", "genes"]
  • In addition to that iJO1366 also contains: name and version. For instance:
!!omap
- metabolites: ***
- reactions: ***
- id: iJO1366
- name: Escherichia coli str. K-12 substr. MG1655
- compartments: !!omap ***
- version: '1'
  • According to the current SBML3FBC2 specifications parenthesis in the formula field are not allowed. This is not so much something to do with the YAML format per se, but something to keep in mind if you want to distribute formally valid SBML files. There is a notion to change this though, but I'm not sure how soon this change will happen.
    Example formula that's not valid: (C6H10O5)n
    Example formula that is valid: C20H21N7O7
    SBML is very strict with this, and for instance memote won't run if a model has issues that conflict with SBML rules. @edkerk I've noticed that this is the case with the Yarrowia Model too. The formal representation of the metabolite dictionary in YAML seems to be fine though!

  • As for reactions, looks good to me structurally. One side-thought I had when looking through the specification was that we might want to start using some ontology for subsystems and represent them as an ordered dict in the YAML (and internally) as to avoid having to parse a list of strings:

- subsystem:
        - sce00620  Pyruvate metabolism
        - sce00920  Sulfur metabolism
  • compartments: Make sure to use an ordered dictionary there! (denoted by the YAML tag !!omap). I can see that you did sort it when writing the file but this sorting may or may not be lost when re-importing the YAML (which we don't recommend but some people/ software may take to doing that)
    Compare iJO1366.yml:
- compartments: !!omap
  - c: cytosol
  - e: extracellular space
  - p: periplasm

vs. yours:

- compartments:
    - c: cytoplasm
    - ce: cell envelope
    - e: extracellular
    - er: endoplasmic reticulum
    - erm: endoplasmic reticulum membrane
    - g: Golgi
    - gm: Golgi membrane
    - lp: lipid particle
    - m: mitochondrion
    - mm: mitochondrial membrane
    - n: nucleus
    - p: peroxisome
    - v: vacuole
    - vm: vacuolar membrane

That's all I can see from comparing the two with the naked eye. I hope it helps!

@BenjaSanchez
Copy link
Contributor Author

Thanks @ChristianLieven for the review! Answering to your comments:

According to the JSON schema these top-level keys are required...

The only one missing is id, which RAVEN does not support, but we can include it as input in writeYaml (I left it as TODO in the function). Same goes for name and version.

According to the current SBML3FBC2 specifications parenthesis in the formula field are not allowed.

We have issue #19 to address that, hopefully work on it as soon as possible. @hongzhonglu could you maybe look into the ones remaining in curation/metabolites? I already fixed all tRNA's.

compartments: Make sure to use an ordered dictionary there!

fixed in c80e6ef

As the remaining changes might take longer, I think it's better if we open a specific branch for making sure that the .yaml of RAVEN matches the .yaml of cobrapy. In that branch we should also fix compatibility of the current .xml file (from COBRA toolbox) with cobrapy (I know from GECKO that additional compartment notations are appended to the current ids when loaded), otherwise cobrapy users will still not be able to contribute to the model.

Thank you all for the input!! If no other things pop up I will merge on monday

@BenjaSanchez BenjaSanchez merged commit f6594b9 into devel Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new field/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants