Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CBRD-25728] Support parallel sort for ORDER_BY #5694

Open
wants to merge 60 commits into
base: feature/parallel_sort
Choose a base branch
from

Conversation

shparkcubrid
Copy link
Contributor

http://jira.cubrid.org/browse/CBRD-25728

Implementation
'ORDER BY'인 경우 입력 파일을 병렬 개수만큼 나누어 각각 정렬하고, 결과를 합병하여 출력 임시파일에 저장한다.
% 기존 병렬 관련 코드 전부 삭제함. 기존 코드는 sort 메모리를 크게 설정한 상태에서만 의미가 있어, 실질적 사용이 불가하다.

주요 함수

 sort_listfile() : 정렬 시작
   sort_check_parallelism() : 병렬 개수 확인
   sort_start_parallelism() : sort_param 복사 및 임시 파일 나누기
     sort_listfile_execute() : 병렬 정렬 수행
   sort_end_parallelism() : 결과 합병 및 출력 임시 파일에 저장
 sort_return_used_resources() : 사용 자원 반환

자원 사용관련 수정사항
자원의 생성과 해제를 다른 thread에서 가능하게 수정하거나, 그것이 힘든 경우 한 thread에서 생성과 해제를 하도록 수정한다.

  • Temporary file : 임시 파일 thread 정보 저장시 mutex 추가
  • Page : FIX한 page를 동일 thread에서 UNFIX하도록 수정. input list file의 open과 close를 각 thread에서 진행함.
  • private allocation : 특정 private alloc을 malloc으로 변경

Implementation
For cases with 'ORDER BY', the input file is divided into many parts and sorted separately. The results from parallel thread are then merged into output temporary file.
% All previous parallelism-related code has been removed. The previous implementation was only meaningful when sort memory was set to a very large value, making it practically unusable.

Main Functions

sort_listfile(): Starts the sorting process
 sort_check_parallelism(): Checks the degree of parallelism
 sort_start_parallelism(): Copies sort_param and divides the input temporary file
  sort_listfile_execute(): Executes parallel sorting
 sort_end_parallelism(): Merges results into the output temporary file
sort_return_used_resources(): Releases used resources

Modifications Related to Resource Usage
Adjust resource allocation and free to be performed by different threads, or if this is difficult, ensure allocation and free are handled within the same thread.

  • Temporary file: Add mutex when adding temporary info in the local thread.
  • Page: Modify to ensure FIXed pages are UNFIXed by the same thread. (Opening and closing an input list file)
  • Private allocation: Change certain private allocations to malloc.

@shparkcubrid shparkcubrid self-assigned this Dec 9, 2024
@shparkcubrid shparkcubrid requested review from youngjinj, xmilex-git and Hamkua and removed request for hornetmj and beyondykk9 December 9, 2024 13:01

thread_p->push_resource_tracks ();

#if !defined(NDEBUG)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

수행시간 확인을 위해 작성된 코드는 삭제될 예정입니다.

(DUP_PRM_FUNC) NULL},
{PRM_ID_MAX_PARALLEL_THREAD,
PRM_NAME_MAX_PARALLEL_THREAD,
(PRM_FOR_SERVER | PRM_USER_CHANGE | PRM_FOR_SESSION | PRM_HIDDEN),
Copy link
Contributor

@youngjinj youngjinj Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

디버그 빌드에서 아래 코드에 의해 Aborted (core dumped)가 발생합니다.

static int
sysprm_load_and_init_internal (const char *db_name, const char *conf_file, bool reload, const int load_flags)
{
  ...
#if !defined(NDEBUG)
  /* verify flags are not incorrect or confusing */
  for (i = 0; i < NUM_PRM; i++)
    {
      int flag = prm_Def[i].static_flag;
      if (PRM_IS_FOR_SESSION (flag) && (!PRM_IS_FOR_CLIENT (flag) || !PRM_USER_CAN_CHANGE (flag)))
        {
          /* session parameters can only be parameters for client that are changeable on-line */
          assert (0);
        }
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

수정했습니다.

Copy link
Contributor

@youngjinj youngjinj Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

다음 코드에서도 PRM_FOR_SESSIONPRM_HIDDEN이 함께 있어서 같은 문제가 발생합니다.

...
if (PRM_IS_FOR_SESSION (flag) && PRM_IS_HIDDEN (flag))
  {
    /* hidden parameters are not allowed to use PRM_FOR_SESSION flag */
    assert (0);
  }
...

일단은 PRM_HIDDEN을 제거하고 리뷰하겠습니다.


if (parallel_type == PX_THREAD_IN_PARALLEL)
{
for (int i = 0; i < sort_param->px_max_index; i++)
Copy link
Contributor

@youngjinj youngjinj Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_listfile 함수에서 parallel_num 만큼 sort_return_used_resources 함수를 호출하고 있습니다.

if (parallel_num > 1)
  {
    for (int i = 0; i < parallel_num; i++)
      {
        sort_return_used_resources (thread_p, &px_sort_param[i], PX_THREAD_IN_PARALLEL);
...

sort_return_used_resources 함수 내부에서 px_max_index 만큼 아래 코드를 반복하고 있습니다.

if (parallel_type == PX_THREAD_IN_PARALLEL)
  {
    for (int i = 0; i < sort_param->px_max_index; i++)
      {
        if (sort_param->get_arg != NULL)
          {
            SORT_INFO *sort_info_p = (SORT_INFO *) sort_param->get_arg;
            if (sort_info_p->s_id != NULL)
              {
                db_private_free_and_init (thread_p, sort_info_p->s_id);
              }

            if (sort_info_p->input_file != NULL)
              {
                db_private_free_and_init (thread_p, sort_info_p->input_file);
              }

            db_private_free_and_init (thread_p, sort_param->get_arg);
          }
        }
    }

반복문 안의 코드가 이상합니다.

  1. i 값이 사용되지 않고, 같은 코드가 반복됩니다.
  2. 외부에서 parallel_num 만큼 반복하고 있기 때문에 내부에서는 px_max_index에 대한 반복이 필요없다고 생각합니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants