Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are task dependencies (depend clauses) in bolt working? #96

Open
gregrodgers opened this issue Feb 9, 2021 · 10 comments
Open

Are task dependencies (depend clauses) in bolt working? #96

gregrodgers opened this issue Feb 9, 2021 · 10 comments

Comments

@gregrodgers
Copy link

This could be my user error. I have a simple code where I am trying to get T2 to execute after T1 because T1 writes x which is input to T2. The code is not setting x to 2 before T2 runs. Here is the code. This fails the same way in both libgomp and libbolt. It fails in clang and gcc 7.5.

#include <stdio.h>
#include <omp.h>
int foo() {
   int x = 1;
   int x_is_not_equal_two=0;
   #pragma omp task depend(in:x) shared(x_is_not_equal_two,x)
   {
      if (x != 2) {
         x_is_not_equal_two = 1;
         printf(" T2: INPUT dependend clause x should now be 2  x:%d\n",x);
      }
   }
   #pragma omp task depend(out:x) shared(x)
   {
      printf(" T1: OUTPUT setting x to 2\n");
      x=2;
   }
   printf("before taskwait x:%d\n",x);
   #pragma omp taskwait
   printf("after  taskwait x:%d  x  was not equal 2 in T2:%d \n", 
      x, x_is_not_equal_two);
   return x_is_not_equal_two;
}

int main() {
   int rc=0;
   omp_set_num_threads(2);
   #pragma omp parallel
   #pragma omp single nowait
   rc = foo();
   printf("rc:%d\n",rc);
   return rc;
}

@shintaro-iwasaki
Copy link
Collaborator

Thank you for a question. In my understanding, task depend(in:) can wait for a task depend (out :) item that was previously generated. I believe the idea behind this is that the program runs correctly when all OpenMP task directives are ignored and only one thread executes a program (this rule does not apply to all the OpenMP directives/programs, though).

In your example, #pragma omp task depend(out:x) must be created first.

   // I checked it with GCC 9.3.0, but it works for Clang too.
   #pragma omp task depend(out:x) shared(x)
   {
      printf(" T1: OUTPUT setting x to 2\n");
      x=2;
   }
   #pragma omp task depend(in:x) shared(x_is_not_equal_two,x)
   {
      if (x != 2) {
         x_is_not_equal_two = 1;
         printf(" T2: INPUT dependend clause x should now be 2  x:%d\n",x);
      }
   }

BOLT should support task depend, but we are happy if you report any problems if you find!

@gregrodgers
Copy link
Author

Thank you, That makes sense now . So I reordered the code and added a sleep in T1 to force asynchronous behavior.
I tested with gcc, LLVM 6, and my development version of LLVM 13 called AOMP. Turns out that only gcc passes the asynchronous test. Apparently clang/llvm is not generating tasking code properly. Here is the output, my run script, and my final test code for future reference.

=====> LLVM 6
/usr/lib/llvm-6.0/bin/clang -O3 -fopenmp=libgomp taskdepend.c -lgomp -labt -L/home/grodgers/rocm/bolt/lib -o taskdepend
 T1: OUTPUT setting x to 2 after sleep 5
 T1: x is now set to 2
 T2: INPUT dependent clause  x:2
before taskwait x:2
after taskwait x:2
FAIL

=====> AOMP 13
/home/grodgers/rocm/aomp/bin/clang -O3 -fopenmp=libgomp taskdepend.c -lgomp -labt -L/home/grodgers/rocm/bolt/lib -o taskdepend
 T1: OUTPUT setting x to 2 after sleep 5
 T1: x is now set to 2
 T2: INPUT dependent clause  x:2
before taskwait x:2
after taskwait x:2
FAIL

=====> gcc
gcc -O3 -fopenmp taskdepend.c -lgomp -labt -L/home/grodgers/rocm/bolt/lib -o taskdepend
before taskwait x:1
 T1: OUTPUT setting x to 2 after sleep 5
 T1: x is now set to 2
 T2: INPUT dependent clause  x:2
after taskwait x:2
PASS

Here is the script I ran for testing.

BOLTHOME=$HOME/rocm/bolt/lib
BOLTOPTS="-lgomp -labt -L$BOLTHOME"
export LD_LIBRARY_PATH=$BOLTHOME

cmd1="/usr/lib/llvm-6.0/bin/clang -O3 -fopenmp=libgomp taskdepend.c $BOLTOPTS -o taskdepend"
cmd2="$AOMP/bin/clang             -O3 -fopenmp=libgomp taskdepend.c $BOLTOPTS -o taskdepend"
cmd3="gcc                         -O3 -fopenmp         taskdepend.c $BOLTOPTS -o taskdepend"
echo
echo "=====> LLVM 6"
echo $cmd1
$cmd1
./taskdepend
echo
echo "=====> AOMP 13"
echo $cmd2
$cmd2
./taskdepend
echo
echo "=====> gcc"
echo $cmd3
$cmd3
./taskdepend

And here is the final test code.

#include <stdio.h>
#include <omp.h>
#include <unistd.h>
#define mysleep(n) usleep((n)*1000000)
int foo() {
   int x = 1;
   int err = 0;
   #pragma omp task depend(out:x) shared(x)
   {
      printf(" T1: OUTPUT setting x to 2 after sleep 5\n");
      mysleep(1);
      x=2;
      printf(" T1: x is now set to 2\n");
   }

   #pragma omp task depend(in:x) shared(x)
   {
      printf(" T2: INPUT dependent clause  x:%d\n",x);
   }
   printf("before taskwait x:%d\n",x);
   if (x == 2)
     err=1;
   #pragma omp taskwait
   if (x != 2)
      err=1;
   printf("after taskwait x:%d\n",x);

   return err;
}

int main() {
   int rc=0;
   omp_set_num_threads(2);
   #pragma omp parallel
   #pragma omp single nowait
   rc = foo();
   if (rc)
     printf("FAIL\n");
   else
     printf("PASS\n");
   return rc;
}

@shintaro-iwasaki
Copy link
Collaborator

@gregrodgers Thank you for your report. We will also investigate this (hopefully within a couple of weeks).

@gregrodgers
Copy link
Author

Thank you. As a follow-up. I tried LLVM 6 and AOMP13 with their native library builds, and they now pass and exhibit the desired async behavior. So I now conclude that I cannot plug in the bolt libraries for LLVM . Maybe we should leave this open.

@shintaro-iwasaki
Copy link
Collaborator

Thank you. Yes, the depend support in BOLT seems broken. I will reopen the issue and create a patch soon.

I'd be happy if you would allow me to use your code (with some modification) as a test.

@gregrodgers
Copy link
Author

You are welcome to use the code above.

@shintaro-iwasaki
Copy link
Collaborator

Thanks!

shintaro-iwasaki added a commit to shintaro-iwasaki/bolt that referenced this issue Feb 23, 2021
shintaro-iwasaki added a commit to shintaro-iwasaki/bolt that referenced this issue Feb 25, 2021
@shintaro-iwasaki
Copy link
Collaborator

shintaro-iwasaki commented Feb 25, 2021

We have not fully checked all and understood why, but it "seems" that BOLT 2.x (it is now BOLT 2.0) based on LLVM 11.0 does not support depend properly when the GCC is older than 6.x.

https://jenkins-pmrs.cels.anl.gov/view/abt/job/bolt-review-support-check/

BOLT+Argobots failed bolt-libomp.tasking.task_depend_bolt_96.c with GCC 4.9.4: link
[EDIT: 21/2/25/11:26 CST] Sorry, GCC 4.9.4 failed because GCC 4.x uses C90 by default. I need further investigation for this, but the temporary conclusion is unchanged: CI passes tests including this task_depend_bolt_96.c when GCC >= 6.5.
[EDIT: 21/2/25/12:45 CST] GCC 4.9.4 still failed some depend tests but it passed this task_depend_bolt_96.c test. In any case, we recommend GCC >=6.x or Clang >= 6.x for depend.

BOLT+Argobots passed bolt-libomp.tasking.task_depend_bolt_96.c with GCC 5.5.0 but failed some other scheduling tests: link

BOLT+Argobots passed every depend and scheduling tests with GCC 6.5.0 or newer: link

Currently I can say this depend code works if one uses GCC 6.5 or newer to compile the OpenMP program. This test tries several Clang compilers (6.0.1 or newer) to compile tests, but I could not find a problem regarding this depend scheduling issue.


It is still under investigation and I do not say this happens because of GCC or LLVM OpenMP or anything else. Even if it is because of the combination issue, such support should be explicitly mentioned. I will update this CI and summarize the support (and fix it if possible).

@shintaro-iwasaki
Copy link
Collaborator

This testing does not cover all compilers and architectures. If you failed this depend test with a newer GCC+BOLT, we would appreciate any information (e.g., GCC/Clang version, CPU, etc) from you to identify the problem.

@HPC4AI
Copy link

HPC4AI commented Feb 18, 2023

This testing does not cover all compilers and architectures. If you failed this depend test with a newer GCC+BOLT, we would appreciate any information (e.g., GCC/Clang version, CPU, etc) from you to identify the problem.

It seems BOLT does not work well with task dependencies. It causes a segment fault. (null pointer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants