Unable to use emll to get correct elasticity values on a model with ground truth values #24

janisshin · 2024-07-02T07:40:15Z

I've tried using the improved emll version on a modified yeast model to try and get elasticity values and compare them to ground truth values. Here is the notebook with my work: notebook link. The ELBO convergence plot in my notebook does not seem great. Am I using emll correctly and is this the closest that emll can get to the ground truth values?

Here is the model referenced in the notebook and the data.

The text was updated successfully, but these errors were encountered:

janisshin · 2024-07-16T03:20:19Z

@mcnaughtonadm do you have any updates?

mcnaughtonadm · 2024-07-16T04:04:46Z

I do have some updates @janisshin, sorry I thought I'd have a bit more time to test a few different solutions to your problem but had an annoying paper revision that took up some extra time 🙄.

So I went through and tried to run your code the way that I would go about running the emll workflow. I'm not sure how things work with tellurium and your SBML model but it seems like there may be some issues stemming from how you are building up the requisite components for the pymc_model and how they translate to the inference.

Things like defining Ey_t as a second pm.Deterministic variable, or your HalfNormal error priors may be causing some additional complexity with the inference causing some weird artifacts early on and leading to the funky ADVI, so I'd like to try and get to the point right before getting to the pymc_model building.

I tried to get to the step of building up the model, but ran into some problems doing it my way versus how tellurium iterates through the data and constructs things like the steady state and boundaries, do you think you could potentially cloudpickle the variables leading up to your pymc_model construction? Something like:

import cloudpickle
import gzip

with gzip.open("pymc_data.pgz", "wb") as f:
            cloudpickle.dump(
                {
                    "vn": self.vn,
                    "en": self.en,
                    "xn": self.xn,
                    "v_inds": self.v_inds,
                    "ll": self.ll,
                    "v_star": self.v_star,
                    ...,
                },
                f,
            )

Just include the variables you end up using to build the pymc_model before running inference and place it somewhere in the repo you shared the files from? I just can't recreate parts of this from your SBML and data since it's lacking some information that I am familiar with to create certain variables for emll without making assumptions that may not be correct.

emll is unfortunately still pretty temperamental on how data is passed into the linlog model and subsequently how that linlog model is used. However, I think it could be a pymc problem, which is easier to address.

janisshin · 2024-07-17T00:01:47Z

Hi Andrew,
Here is the file you requested

janisshin · 2024-07-24T23:14:09Z

checking in again to see if you've got any updates or need more things from me.

mcnaughtonadm · 2024-07-25T04:11:27Z

I was able to run emll with your data. I wanted to test what running things using the normal hackett structure of establishing the pymc_model parameters found in the run_hackett_inference.py and got a slightly more "expected" ELBO plot:

This was by using the fit() parameters you ran in your notebook.

I also tried running the pymc_model the way you constructed it in your notebook and was able to replicate what you were seeing:

Something I noticed that could be causing this weird behavior is your use of HalfNormal distributions as the sigma parameter for the Normal distributions. If we remove those we start to see the more expected looking shape (not perfect but doesn't exhibit this weird twisting)

We typically used fixed sigma but I don't have a good enough reason why you wouldn't be able to use a HalfNormal there. I guess my recommendation would be to make sure that the error is either defined correctly or required to be a distribution over a float value.

We'd need to do some more testing on our end to see if this type of HalfNormal error prior is useable, probably for the hackett example.

janisshin · 2024-09-23T21:30:36Z

@mcnaughtonadm,
I'm running another benchmarking test for BMCA using emll on a larger E. coli model and I'm finding that the vast majority of predicted elasticity values are not matching up with the ground truth elasticity values. Is this normal?

Here is the graph I'm getting

and here is the notebook that I used to make the graph.

I'm running the notebook again to check that the predicted flux values and predicted chi values match the groundtruth values. I hope to update the notebook with these graphs by tmrw morning.

@djinnome I just wanted to give you a heads up on this discrepancy before presenting it at our Wednesday meeting.

djinnome · 2024-09-23T21:45:33Z

How are you computing the predicted elasticity values? What is returned by Bayesian MCA is a posterior distribution over each elasticity.
One option is to compute a credible interval for each elasticity and see if the ground-truth elasticity falls within it,
Another option is to compute the mean average error between prior elasticities and ground truth elasticities and compare it with the mean average error between posterior and ground-truth elasticities to see if it learned anything from the data. Lastly, you should check to see if the ranking of the predicted FCC's is close the ranking of the ground-truth FCCs. If it is, then that indicates there could be parameter identifiability issues.

janisshin · 2024-09-23T22:08:09Z

How are you computing the predicted elasticity values? What is returned by Bayesian MCA is a posterior distribution over each elasticity.
One option is to compute a credible interval for each elasticity and see if the ground-truth elasticity falls within it,
Another option is to compute the mean average error between prior elasticities and ground truth elasticities and compare it with the mean average error between posterior and ground-truth elasticities to see if it learned anything from the data.

I'm using arviz.summary(trace)['mean']

Lastly, you should check to see if the ranking of the predicted FCC's is close the ranking of the ground-truth FCCs. If it is, then that indicates there could be parameter identifiability issues.

It's somewhat close. Out of the top10 ground truth FCC values, BMCA predicts about 6-7 correctly. Is this enough information or would it help if I also made a parity plot for the FCC rankings as well?

I'm not sure what parameter identifiability issues are, but happy to schedule a meeting to talk about it, if necessary.

janisshin · 2024-10-01T17:27:58Z

@djinnome ,

I've made the graphs you've requested for my benchmarking project, but I had some follow up questions which will be at the bottom.

Try 100%, 50%, and 10% metabolite coverage

100% internal metabolite data
50% internal metabolite data
10% internal metabolite data

Plot error of elasticities and FCCs of prior distribution vs ground truth

For elasticities, ctrl+F "Elasticities, comparison of prior and post" in the notebooks posted above to find the graphs

Compare FCC rankings of union of top 10 ground truth, 0.5x, 1.5x, and prior FCCs

ctrl+F "top 15 table" in the notebooks posted above to find the tables showing the top 15 ranked FCCs, ground truth and predicted.

Table columns

0.5x-pr: 0.5x perturbation to all enzymes, prediction made on prior distribution
1.5x-pr: 1.5x perturbation to all enzymes, prediction made on prior distribution
0.5x-post: 0.5x perturbation to all enzymes, prediction made on posterior distribution
1.5x-post: 1.5x perturbation to all enzymes, prediction made on posterior distribution

For the number of correctly predicted reactions in the top 10 ranked FCCs, ctrl+F "number of correct predictions of top 10 FCC values"

In the cell immediately after, you'll find the ground truth rankings of the other top 10 FCC predictions that are not in the ground truth top 10 FCCs. I know you asked for the union, but I think the set difference is more useful in seeing this data.

Try Eflux2 vs real flux data in terms of FCC rankings.

I will aim to have this done later today.

janisshin · 2024-10-01T18:04:19Z

@djinnome ,
I'd like to have more guidance on how to use Eflux2 on my model, given that I do not have any transcriptomics data.

Also, I have further questions about using emll on the putida model.
Earlier in the year, I was advised to use the emll.util.create_Ey_matrix(model) for building an Ey matrix.
I realized that this method relies on model.medium() to identify boundary indices.
Does this mean that the end product metabolite 4aca should be added to the model.medium? or is it considered to be an internal metabolite? or should it be left out of the model all-together?

Originally, I chose the last option, but then I did not know how to incorporate 4aca concentration data into emll.

Can we (or @mcnaughtonadm, whoever is more appropriate) meet to discuss this?

djinnome · 2024-10-01T18:23:17Z

Hey @janisshin do you have proteomics data? If so, that is actually a better use of Eflux2 than transcriptomics data, but the normalization might need to be a bit different to account for the fact that proteomics has different distributional qualities than transcriptomics.

janisshin · 2024-10-01T23:23:00Z

I do not have proteomics data either. Should I go find some on the internet?

I found this one, but I need to process the data it seems

djinnome · 2024-10-01T23:40:07Z

For the benchmark examples, I think you can just make up the proteomics data from the ground-truth model, yes?

With regard to real proteomics data, it would need to come from a strain that produces your end product (4ACA in this case).

You don't need to include 4ACA in the model.medium (which assumes inputs only) but your Ey matrix should contain external 4ACA. Also, when you run Eflux, you should definitely make sure that you are maximizing your end product with a minimal growth rate constraint. We can talk more about this tomorrow.

janisshin · 2024-10-02T00:07:31Z

We don't have a Biosystems meeting tmrw, but I'm happy to meet with you tmrw at a time that is most convenient for you. Please let me know when that would be.

janisshin assigned mcnaughtonadm Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to use emll to get correct elasticity values on a model with ground truth values #24

Unable to use emll to get correct elasticity values on a model with ground truth values #24

janisshin commented Jul 2, 2024

janisshin commented Jul 16, 2024

mcnaughtonadm commented Jul 16, 2024

janisshin commented Jul 17, 2024

janisshin commented Jul 24, 2024

mcnaughtonadm commented Jul 25, 2024

janisshin commented Sep 23, 2024

djinnome commented Sep 23, 2024 •

edited

Loading

janisshin commented Sep 23, 2024

janisshin commented Oct 1, 2024

janisshin commented Oct 1, 2024

djinnome commented Oct 1, 2024

janisshin commented Oct 1, 2024 •

edited

Loading

djinnome commented Oct 1, 2024 •

edited

Loading

janisshin commented Oct 2, 2024

Unable to use emll to get correct elasticity values on a model with ground truth values #24

Unable to use emll to get correct elasticity values on a model with ground truth values #24

Comments

janisshin commented Jul 2, 2024

janisshin commented Jul 16, 2024

mcnaughtonadm commented Jul 16, 2024

janisshin commented Jul 17, 2024

janisshin commented Jul 24, 2024

mcnaughtonadm commented Jul 25, 2024

janisshin commented Sep 23, 2024

djinnome commented Sep 23, 2024 • edited Loading

janisshin commented Sep 23, 2024

janisshin commented Oct 1, 2024

janisshin commented Oct 1, 2024

djinnome commented Oct 1, 2024

janisshin commented Oct 1, 2024 • edited Loading

djinnome commented Oct 1, 2024 • edited Loading

janisshin commented Oct 2, 2024

djinnome commented Sep 23, 2024 •

edited

Loading

janisshin commented Oct 1, 2024 •

edited

Loading

djinnome commented Oct 1, 2024 •

edited

Loading