Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different numbers of iterations while running code in different hardware paltform #1

Open
Lynkzhang opened this issue Apr 6, 2018 · 7 comments

Comments

@Lynkzhang
Copy link

Lynkzhang commented Apr 6, 2018

I ran AMG on Xeon, KNL and KNM platform, but when I was running on KNM/KNL I find that the numbers of iterations become much more than on Xeon(with same configuration), what's more, when I running with 1 OMP thread per process, the number of iterations will become 0 on KNM/KNL, is there any way(like some flag) to control the number of iteration, or this is a problem of the code?

@ulrikeyang
Copy link
Member

That should not happen. I have not observed any such behavior on the various machines where I ran and tested the code, however I never ran it on KNM/KNL. How did you build and run the code?

@Lynkzhang
Copy link
Author

Lynkzhang commented Apr 7, 2018

Thank you so much for replying me!
Here is how I compile:
source /opt/intel/parallel_studio_xe_2018.1.038/bin/psxevars.sh intel64 export I_MPI_CC=icc export I_MPI_CXX=icpc export I_MPI_F77=ifort export I_MPI_F90=ifort make
for the make file, I added flag: -ipo -xHost

here is how I run this,I am using sde for profile:
mpiexec -host $HOST -genv OMP_NUM_THREADS=1 -n 64 bash -c "/home/Lynkzhang/test/precision/dep/sde-external-8.16.0-2018-01-30-lin/sde64 -sse-sde -global_region -mix_omit_per_thread_stats -mix_omit_per_function_stats -start_ssc_mark 111:repeat -stop_ssc_mark 222:repeat -iform 1 -omix oSDE/"$MPI_LOCALRANKID".txt -knm -- ./test/amg -problem 1 -P 4 4 4 -n 64 64 64"

simply speaking, I am running something like:
mpiexec -host $HOST -genv OMP_NUM_THREADS=1 -n 64 ./amg -problem 1 -P 4 4 4 -n 64 64 64

here is the output:

Running with these driver parameters:
solver ID = 1

Laplacian_27pt:
(Nx, Ny, Nz) = (256, 256, 256)
(Px, Py, Pz) = (4, 4, 4)

=============================================
Generate Matrix:

Spatial Operator:
wall clock time = 17.962424 seconds
wall MFLOPS = 0.000000
cpu clock time = 17.950000 seconds
cpu MFLOPS = 0.000000

RHS vector has unit components
Initial guess is 0

IJ Vector Setup:

RHS and Initial Guess:
wall clock time = 3.546373 seconds
wall MFLOPS = 0.000000
cpu clock time = 3.550000 seconds
cpu MFLOPS = 0.000000

=============================================
Problem 1: AMG Setup Time:

PCG Setup:
wall clock time = 81.694932 seconds
wall MFLOPS = 0.000000
cpu clock time = 81.700000 seconds
cpu MFLOPS = 0.000000

FOM_Setup: nnz_AP / Setup Phase Time: 0.000000e+00

Walltime of the main kernel: 16.958590 sec

Problem 1: AMG-PCG Solve Time:

PCG Solve:
wall clock time = 17.080244 seconds
wall MFLOPS = 0.000000
cpu clock time = 17.090000 seconds
cpu MFLOPS = 0.000000

Iterations = 0
Final Relative Residual Norm = 0.000000e+00

FOM_Solve: nnz_AP * Iterations / Solve Phase Time: 0.000000e+00

Figure of Merit (FOM_1): 1.480463e+06

@Lynkzhang
Copy link
Author

The problem is compile on KNM/KNL with flag -xHost. The results become normal after I remove this flag.

@Lynkzhang Lynkzhang reopened this Apr 11, 2018
@Lynkzhang
Copy link
Author

we find that, the code will have problem when it is compiled with AVX512 instruction set (like compile with flag -xMIC-AVX512)

@ulrikeyang
Copy link
Member

ulrikeyang commented Apr 11, 2018 via email

@Lynkzhang
Copy link
Author

Like I mentioned before, the problem is number of iterations become abnormal.
especially running with one thread per process, the number will be 0.
Under other conditions(like run binary directly ./amg ), the number of iterations will change to a different random number each time.

@ulrikeyang
Copy link
Member

What does using -xMIC-AVX512 change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants