Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CBRD-25751] Fix core dump at logging error message from ER_HB_PROCESS_EVENT #5727

Merged
merged 6 commits into from
Jan 6, 2025
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions src/executables/master_heartbeat.c
Original file line number Diff line number Diff line change
Expand Up @@ -3599,7 +3599,7 @@ hb_resource_job_confirm_start (HB_JOB_ARG * arg)
/* shutdown working server processes to change its role to slave */
snprintf (hb_info_str, HB_INFO_STR_MAX, "%s The master node failed to restart the server process",
HA_FAILBACK_DIAG_STRING);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_PROCESS_EVENT, 1, hb_info_str);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_NODE_EVENT, 1, hb_info_str);
Copy link
Contributor

@hornetmj hornetmj Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ER_HB_NODE_EVENT와 ER_HB_PROCESS_EVENT를 나누는 원칙(규칙성)이 명확할 필요가 있습니다. 예를들어, hb_resource_job_*()에서는 ER_HB_PROCESS_EVENT를 hb_cluster_job_*()에서는 ER_HB_NODE_EVENT를 적용하는 것입니다. 추가적으로, 같은 콜백 함수에서 ER_HB_PROCESS_EVENT와 ER_HB_NODE_EVENT가 동시에 사용되는 것 역시 추후 메시지 추가시 기준을 잡기 어렵게 만들 수 있습니다.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

해당 message ER_HB_PROCESS_EVENT로 원복하였습니다.

error = hb_resource_job_queue (HB_RJOB_DEMOTE_START_SHUTDOWN, NULL, HB_JOB_TIMER_IMMEDIATELY);
assert (error == NO_ERROR);

Expand Down Expand Up @@ -4158,7 +4158,7 @@ hb_cleanup_conn_and_start_process (CSS_CONN_ENTRY * conn, SOCKET sfd)
snprintf (hb_info_str, HB_INFO_STR_MAX,
"%s Server process failure repeated within a short period of time. The current node will be demoted",
HA_FAILBACK_DIAG_STRING);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_PROCESS_EVENT, 1, hb_info_str);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_PROCESS_EVENT, 2, hb_info_str, error_string);
YeunjunLee marked this conversation as resolved.
Show resolved Hide resolved

error = hb_resource_job_queue (HB_RJOB_DEMOTE_START_SHUTDOWN, NULL, HB_JOB_TIMER_IMMEDIATELY);
assert (error == NO_ERROR);
Expand Down Expand Up @@ -4844,13 +4844,17 @@ hb_thread_check_disk_failure (void *arg)
{
snprintf (hb_info_str, HB_INFO_STR_MAX,
"%s The master node has lost its role due to server process problem, such as disk failure",
HA_FAILOVER_DIAG_STRING);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_PROCESS_EVENT, 1, hb_info_str);
HA_FAILBACK_DIAG_STRING);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_NODE_EVENT, 1, hb_info_str);

/* be silent to avoid blocking write operation on disk */
hb_disable_er_log (HB_NOLOG_DEMOTE_ON_DISK_FAIL, NULL);
hb_Resource->state = HB_NSTATE_SLAVE;

snprintf (hb_info_str, HB_INFO_STR_MAX, "%s Current node has been successfully demoted to slave",
HA_FAILBACK_SUCCESS_STRING);
MASTER_ER_SET (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_HB_NODE_EVENT, 1, hb_info_str);

pthread_mutex_unlock (&hb_Resource->lock);
pthread_mutex_unlock (&hb_Cluster->lock);
#if !defined(WINDOWS)
Expand Down
Loading