Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to SIGQUIT or throw error during ESMF_Abort #296

Open
danrosen25 opened this issue Sep 11, 2024 · 3 comments
Open

Option to SIGQUIT or throw error during ESMF_Abort #296

danrosen25 opened this issue Sep 11, 2024 · 3 comments
Assignees
Labels
feature/enhancement New feature or request

Comments

@danrosen25
Copy link
Member

The current method to debug ESMF Errors is to build a back trace using ESMF_LogSetError and rc. This gives you a limited amount of information about the state at the time of the error. I started investigating throwing a SIGQUIT error, which can print a backtrace and dump a core. The core dump can be analyzed to see the state causing the error.

diff --git a/src/Infrastructure/VM/src/ESMCI_VMKernel.C b/src/Infrastructure/VM/src/ESMCI_VMKernel.C
index 63b85ad0c3..43c85c5c5c 100644
--- a/src/Infrastructure/VM/src/ESMCI_VMKernel.C
+++ b/src/Infrastructure/VM/src/ESMCI_VMKernel.C
@@ -899,6 +899,7 @@ struct SpawnArg{
 void VMK::abort(){
   // abort default (all MPI) virtual machine
   int finalized;
+  raise (SIGQUIT);
   MPI_Finalized(&finalized);
   if (!finalized)
     MPI_Abort(default_mpi_c, EXIT_FAILURE);
@danrosen25 danrosen25 self-assigned this Sep 11, 2024
@danrosen25 danrosen25 added the feature/enhancement New feature or request label Sep 11, 2024
@anntsay
Copy link

anntsay commented Oct 2, 2024

Dan propose to have this as a runtime option -> that way ESMF quit on error and output info. this allow easier troubleshooting and debugging.

Bob: looks reasonable. and maybe put in 8.8 becuase it is not heavy weight. and this new method will be optional.
Ann confirm that ESMF_LogSetError and rc will still be available and will be the default.

@anntsay
Copy link

anntsay commented Oct 2, 2024

Bill: CESM also uses this.. it make sense to use this as an option
Dan: this is only optional method.. default is still the current method. this is set as a one time flag at run-time.

@danrosen25
Copy link
Member Author

Look at the LogSetError option for abort on error.
Runtime flag (using environment) ESMF_RUNTIME_ABORT_ON_ERROR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants