Output list type field as array of string in Yaml #107

haowang-bioinfo · 2018-05-15T14:43:41Z

The JSON/Yaml formats appear to be very useful in GEM curation, exchange and other things, among which one application is for visualization purpose. Now Yaml has been used as input for Metabolic Atlas website development.

However, the current writeYaml function treats the 'list' type fields (e.g. eccodes,subSystems) differently. It outputs the value as string when there is only one element, but output into array of string when there are multiple elements. This inconsistence causes problems when loading the file to public Yaml parser (such as in python), because the information cannot be correctly extracted if some elements have on value and others have multiple. Now this issue has been addressed in this branch: 81be856.

BenjaSanchez · 2018-05-15T15:17:05Z

@Hao-Chalmers

When writing the function I followed the cobrapy standard, and they actually also have this distinction when it's only 1 field or several: see example file in issue support for export to cobrapy-compatible yaml format #77, for instance reactions r_2246 (one ec-code) and r_2248 (2 ec-codes). So if we do what you suggest we would loose compatibility with them, however eventually we want to allow cobrapy users to contribute to our models.
In our repos we should have .yml files as succinct as possible, so they can show the most amount of changes with the least amount of lines. This change would go against that principle.

So instead I would suggest a separate function that can convert the .yml to a parser-compatible version.

edkerk · 2018-05-15T15:25:51Z

@Hao-Chalmers

As @BenjaSanchez said, the default writeYaml output should not be changed.

As alternative, include a parameter in the writeYaml function, e.g. extended or parseable, and set this to true if you want your precise format. The default for this parameter should be false, so the function continues to function as intended. If this parseable format requires a lot of changes, it might indeed make more sense to write a specific function for this purpose, either as part of RAVEN or as part of the relevant repository.

edit: renamed suggested parameter compact to extended as it's more accurate. parseable would be prefered though`

haowang-bioinfo · 2018-05-15T15:41:09Z

@BenjaSanchez, @edkerk

Here are my comments:

The propsed change in this branch keeps the yml file in succinct style without introducing (many) addtional lines, and doesn't go aginst the compact principle.
This function attemps to follow the JSON/Yaml scheme proposed by cobrapy. But any scheme should be open for adjustments toward better fit of practical needs, which is continuously changing. Actually cobrapy doesn't clearly specify this condition.

BenjaSanchez · 2018-05-15T15:53:31Z

@Hao-Chalmers it would definetively include extra lines, for instance what is now:

- ec-code: 1.3.3.6

would then be:

- ec-code:
  - 1.3.3.6

haowang-bioinfo · 2018-05-15T15:59:04Z

True, it does. There won't be many though :-)

pecholleyc · 2018-05-16T08:23:00Z

@BenjaSanchez I understand you want to follow the cobrapy standard but the origin of this issue was because an inconsistency with the "subsystem" field in a reaction. It is defined as type string in JSON/Yaml cobrapy scheme whereas models like YEAST may have reactions with list of subsystems (which can makes sense).

The problem with annotation fields like 'ec-code' is different. I am not sure you are going to lose compatibility with cobrapy since annotation is defined as 'object'. IMO if multiple values for an annotation field is expected it should always be defined as list, it will simplify automatic parsing.

Moreover, if extra lines is a problem, lists can also be represented in an abbreviated form:

- ec-code:
  - 1.3.3.6

can be:

- ec-code: [1.3.3.6]

I also suggest to systematically double-quote strings for some of the fields, the current YAML export of HMR model cannot be properly parsed because of some metabolite's names includes YAML syntax e.g. '-['.

edkerk · 2018-05-16T09:22:36Z

@pecholleyc
The primary aim of this function is compatibility with cobrapy (and of course if changes are made there, then we should update writeYaml). So if subSystems are currently not written identical to cobrapy output (and please verify this with actual output, because I have seen for instance for cobra toolbox that specified schemes are not always up-to-date), then please fix that specifically.

Also, as mentioned above, there is no problem to have an alternative output from writeYaml, but let's keep the default output cobrapy compatible. As suggested, you can specify a parameter parseable (with a default of 'false') or even cobrapy (with a default of 'true') to specify the exact formatting of the Yaml file. This alternative format could for instance include double-quoted strings. I just don't see why this would have to break the existing (default) output format?

BenjaSanchez · 2018-05-16T09:33:08Z

@Hao-Chalmers actually they would be many new lines: the .yml file of yeast-GEM saved with your proposed change would be 82,202 lines, compared to 74,394 lines as it is currently stored.
And completely agree with @edkerk, we can have secondary ways of outputting the model but the way to store it in the repo should be the most compact option.

haowang-bioinfo · 2018-05-16T09:44:01Z

@edkerk A similar issue has been proposed in cobrapy repo for adjustment in implementation. Let's see how it proceeds.

@BenjaSanchez There won't be much differences with whether 82,202 or 74,394 lines, because both could not be managed by human eyes. And there would be no difference for digital storage.

The optional parsable output could be a solution, if it turned out to be optimal.

haowang-bioinfo self-assigned this May 15, 2018

haowang-bioinfo added the discussion Not yet settled whether change in code is required. label May 15, 2018

mihai-sysbio mentioned this issue Apr 7, 2021

feat: yaml worflow SysBioChalmers/Human-GEM#173

Merged

2 tasks

haowang-bioinfo closed this as completed Apr 7, 2021

edkerk mentioned this issue Apr 7, 2021

support for export to cobrapy-compatible yaml format #77

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output list type field as array of string in Yaml #107

Output list type field as array of string in Yaml #107

haowang-bioinfo commented May 15, 2018 •

edited

Loading

BenjaSanchez commented May 15, 2018

edkerk commented May 15, 2018 •

edited

Loading

haowang-bioinfo commented May 15, 2018 •

edited

Loading

BenjaSanchez commented May 15, 2018

haowang-bioinfo commented May 15, 2018 •

edited

Loading

pecholleyc commented May 16, 2018

edkerk commented May 16, 2018

BenjaSanchez commented May 16, 2018

haowang-bioinfo commented May 16, 2018 •

edited

Loading

Output list type field as array of string in Yaml #107

Output list type field as array of string in Yaml #107

Comments

haowang-bioinfo commented May 15, 2018 • edited Loading

BenjaSanchez commented May 15, 2018

edkerk commented May 15, 2018 • edited Loading

haowang-bioinfo commented May 15, 2018 • edited Loading

BenjaSanchez commented May 15, 2018

haowang-bioinfo commented May 15, 2018 • edited Loading

pecholleyc commented May 16, 2018

edkerk commented May 16, 2018

BenjaSanchez commented May 16, 2018

haowang-bioinfo commented May 16, 2018 • edited Loading

haowang-bioinfo commented May 15, 2018 •

edited

Loading

edkerk commented May 15, 2018 •

edited

Loading

haowang-bioinfo commented May 15, 2018 •

edited

Loading

haowang-bioinfo commented May 15, 2018 •

edited

Loading

haowang-bioinfo commented May 16, 2018 •

edited

Loading