Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary loading crashes on mangling smspec data #251

Open
berland opened this issue May 10, 2024 · 3 comments
Open

Summary loading crashes on mangling smspec data #251

berland opened this issue May 10, 2024 · 3 comments

Comments

@berland
Copy link
Collaborator

berland commented May 10, 2024

The lines in
https://github.com/equinor/fmu-
ensemble/blob/af1bc17a7b60b8e8ec0808412eaade73136fac8e/src/fmu/ensemble/realization.py#L983-L990
is supposed to catch summary loading failures, but if resdata exits with util_abort() then nothing can be done, it will not be caught by IOError.

For realizations where Eclipse has crashed with a License error, the SMSPEC files and UNSMRY files are left in a state where it will trigger an util_abort(). This situation is unfortunately common enough that this situation should be catched. fmu_ensemble must then detect this situation before it happens to avoid calling the C-code that calls util_abort.

Example of error in this scenario (reproduced with res2csv):

$ res2csv summary SOMETHING_WITH_LICENSE_ERROR.DATA

Error message: rd_smspec_fread_alloc: Sorry the SMSPEC file seems to lack all time information, need either TIME, or DAY/MONTH/YEAR information. Can not proceed.
See file: /tmp/ert_abort_dump.havb.20240510-092807.log for more details of the crash.
Setting the environment variable "ERT_SHOW_BACKTRACE" will show the backtrace on stderr.
Aborted (core dumped)
@berland
Copy link
Collaborator Author

berland commented May 10, 2024

One way to detect this scenario that will trigger the crash can be done in Python with:

import resfo
smspec = resfo.read("SOMETHING_WITH_LICENSE_ERROR.SMSPEC")
if not len([val[1] for val in a if val[0] == "KEYWORDS"][0]):
    print(f"There is no TIME information in the SMSPEC file: {smspec_filename}"

@eivindjahren might have other ideas?

@alifbe
Copy link

alifbe commented May 10, 2024

Can't we fixed it from resdata and make it failed silently just like when it failed to read smspec file? Technically it's the same issue.

Then we don't need to fix it in all tools that using resdata

https://github.com/equinor/resdata/blob/55a2fe6ed3a1c562532c18641e965724d28ec81a/lib/resdata/rd_smspec.cpp#L1259-L1261

@eivindjahren
Copy link
Contributor

@alifbe Probably a good idea to make resdata not use util_abort but signal with exception. However, it was never designed to do that so it can be complicated as it needs to clean up before signaling an error.

The code in question is this https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_smspec.cpp#L1211-L1264

if it returns NULL then rd_sum_fread will return false: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_sum.cpp#L190-L197

And then rd_sum_fread_case will return false: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_sum.cpp#L221-L238

And rd_sum_fread_alloc_case2__ will return NULL: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_sum.cpp#L439-L457

which ends with an excption in rd_sum.py: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/python/resdata/summary/rd_sum.py#L248-L254

I think it makes sense to make the signaling method for resdata.Summary consistent. Recovering from the abort signal was never a part of the public API so replacing it with a recoverable error is not a breaking change, but it is a minor version increment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants