-
Notifications
You must be signed in to change notification settings - Fork 36
Timeout errors for Examples/Advanced/CustomProperties/Hydrocarbon_Processing_example.ipynb #81
Comments
@lbianchi-lbl Is there any easy way to look at the notebook output (from CI) once it finished running? Did it finish in 644.05 seconds, or is that when CI decided to kill it? @agarciadiego is this your notebook? |
@adowling2 These are both excellent questions. I can create a PR on this repo disabling the timeout (i.e. setting it to a very high value) and configuring the CI workflow so that the HTML docs are saved after the build (which is something we wanted to have anyways). I'm currently testing the examples on our build system for the release and I'm also running into the same issue with a timeout, even when set to 1200 s. This does not prove that the notebook would continue to run indefinitely, but seems to at least indicate that it's not a matter of a slight variation in the runtime around the timeout threshold. |
@agarciadiego @bpaul4 @lbianchi-lbl I am curious about looking at the solver output captured in the notebooks and comparing that to when run locally. This would let us look at the number of Ipopt iterations across platforms. |
After some testing, I found that running the standard examples tests ( Before adding scaling, the |
This is wild speculation, but could |
I'm running the notebook manually on our build system and,
|
@bpaul4 @agarciadiego Since as I understand you have an environment where this notebook runs in a reasonable time, could you try running |
@lbianchi-lbl Yes, 'r' means Ipopt is struggling. It stands for restoration phase, which is essentially "last hope" mode before Ipopt gives up. Yes, the difference in Ipopt version would explain it. Next step is to look up what changed in the Ipopt version. What Ipopt version is included in IDAES get extensions? |
@adowling2, thanks for the clarification. According to the
Maybe @eslickj will have more information on when the ipopt version was updated in |
@lbianchi-lbl, here is my environment info: IDAES Pyomo Python OS Dependencies Extras Solvers |
Here is the IPOPT output for the initial solve (post-initialization). I scaled some variables as follows to get the
Solver output:
|
Looking at a much older version, it appears that the number of variables has jumped significantly - 551 variables and 550-700 CPU seconds now compared to 346 variables and 50 CPU seconds in this version which only took 7 iterations to solve with no restoration: https://github.com/agarciadiego/examples-pse/blob/internship/src/Examples/Advanced/CustomProperties/Hydrocarbon_Processing_example.ipynb |
@agarciadiego and I investigated the issue, and the notebook ran as expected on his machine: IPOPT 3.13.2, 346 variables, 7 iterations, ~ 40 total CPU seconds. I will test re-generating the initialization file, if the timeout still occurs then the issue is probably due to a recent code or dependency update from a commit in mid-December when the tests started failing. |
@bpaul4 Thanks for the update and the detailed analysis. I haven't thought about the initialization file (I assume you're talking about the @agarciadiego would it be possible to upload the initialization file that's being used in your 7-iterations runs? |
Generating a new I have saved the old .gz file using IPOPT 3.12 and a new one using IPOPT 3.13 with the scaling I implemented above. Currently, I'm running another version using IPOPT 3.13 without the scaling to see if we can leave the notebook itself alone. I will share all three files, and eventually can push our preferred .gz/notebook versions to #80. |
@bpaul4 thanks a lot for checking that. That looks like a step in the right direction (again, from my limited expertise on this topic). I tried something similar in #82, namely deleting the EDIT I've just read your message more carefully, and I got your point about generating a new file. Yes, if you could, check how it behaves in these three cases and let us know how that goes. |
@bpaul4 @lbianchi-lbl Based on the comments above, I suspect I know what happened. We added some intermediate To confirm, I would start by loading the old json file and check the model for log_variables - you will likely see that they all have default values. You could further confirm by calculating the correct values for these and manually setting them before calling the solver, which should should near-immediate convergence. If this is the case, then the correct solution is to just update the json file. However, this then leads to the question of why we didn't notice this earlier and how we can make sure that we identify issues like these as soon as they occur (so that we know for sure which commit to idaes-pse caused the issue). |
I tested the notebook for 4 cases, and agree with your assessment @andrewlee94 . The test runs below I'm calling CI are the commands that the CI jobs run, while the build process is the test workflow recommended by our documentation. I noticed that the build.yml and build-ci.yml tests are not run the same way or with the same timeout limit.
If this sufficiently addresses the error, I can push the updated json/notebook from case 4 to #80 and close this issue. |
@bpaul4 that's great, thank you so much! If I understand correctly, case 3 is what I've been trying with the most recent commit on #82; if this is the case, then I can also confirm this behavior. I'd say that the solution you're proposing sounds optimal, especially if you think this can be done quickly. If so, then please go ahead and add that fix to #80 (assuming the rest of the PR is ready to merge), so we can include this in the final 1.12.0 release of the IDAES examples. @andrewlee94: thanks for the clarification. If, as I understand, this problem manifested itself only through a longer runtime, then I can see how that would be difficult to catch: the only check that's even indirectly measuring the runtime is the build timeout, but this is not a good test since it would be triggered if the slowdown caused the runtime to be greater than the threshold, but not otherwise. Ideally, we'd have a more systematic way of measuring performance (I tried a simple solution in #64) and analyze that data to detect performance regressions. |
This makes so much sense. Is there a clean way to check the number of Ipopt iterations on some of these notebooks? Perhaps add another type of test that runs nightly. The downside is this would need to get updated every time we update Ipopt, etc. |
@adowling2 I agree that exposing the solver information to the notebook testing pipeline would be a better way to detect this type of issues than relying on the total runtime. Does the object returned by |
@carldlaird @eslickj @jsiirola Here is a quick summary of this thread:
Now that you are up to speed, a few questions for you to weigh in on:
|
|
@carldlaird +1 on your idea to check the json file (initial point) for missing variables. |
The maximum runtime allowed for a single notebook is limited to the value of the
timeout
parameter inbuild.yml
, which is currently set to 600 s for our CI builds. If a notebook takes longer than that to run, it results in an error.Lately, this seems to be happening often for the
Examples/Advanced/CustomProperties/Hydrocarbon_Processing_example.ipynb
notebook, resulting in failing CI runs in both our nightly builds and unrelated PRs such as #80.We should look into why the notebook is taking longer than it used to (since, until lately, the runtime was below the 600 s timeout), and, more in general, see if there could be ways to modify the notebook so that it runs as quickly as possible.
The text was updated successfully, but these errors were encountered: